专利检索 ap:("Alan Gara" OR "Valentina Salapura" OR "Robert W. Wisniewski") AND inv:"Alan Gara" 第 6 页

51.

发明申请
In-Data Path Tracking of Floating Point Exceptions and Store-Based Exception Indication 审中-公开
标题翻译：浮点异常和基于存储的异常指示的数据间路径跟踪

公开(公告)号：US20110047358A1

公开(公告)日：2011-02-24

申请号：US12543614

申请日：2009-08-19

申请人： Alexandre E. Eichenberger , Alan Gara , Michael K. Gschwind

发明人： Alexandre E. Eichenberger , Alan Gara , Michael K. Gschwind

IPC分类号： G06F9/30 , G06F9/302

CPC分类号： G06F9/30018 , G06F9/30021 , G06F9/30029 , G06F9/30036 , G06F9/3013 , G06F9/3842 , G06F9/3865

摘要： Mechanisms are provided for tracking exceptions in the execution of vectorized code. A speculative instruction is executed on a vector element of a vector. An exception condition is detected in association with the vector element based on a result of executing the speculative instruction on the vector element. A special exception value is stored in the vector element in a vector register corresponding to the vector, indicative of the exception condition, without invoking an exception handler for the exception condition. The special exception value is propagated with the vector element of the vector through a processor architecture of the processor, without invoking the exception handler for the exception condition. An exception corresponding to the exception condition indicated by the special exception value is generated only in response to a non-speculative instruction being executed that performs a non-speculative operation on the vector element.

摘要翻译： 提供了用于跟踪执行向量化代码中的异常的机制。对向量的向量元素执行推测指令。基于向量元素执行推测指令的结果，与向量元素相关联地检测异常条件。一个特殊的异常值被存储在矢量寄存器的向量寄存器中，该向量寄存器对应于向量，表示异常条件，而不调用异常条件的异常处理程序。特殊异常值通过处理器的处理器架构与向量的向量元素传播，而不调用异常条件的异常处理程序。由特殊异常值指示的异常条件对应的异常仅在响应于执行对向量元素执行非推测性操作的非推测性指令时产生。

52.

发明申请
Optimizing layout of an application on a massively parallel supercomputer 失效
标题翻译：在大型并行超级计算机上优化应用程序的布局

公开(公告)号：US20060101104A1

公开(公告)日：2006-05-11

申请号：US10963101

申请日：2004-10-12

申请人： Gyan Bhanot , Alan Gara , Philip Heidelberger , Eoin Lawless , James Sexton , Robert Walkup

发明人： Gyan Bhanot , Alan Gara , Philip Heidelberger , Eoin Lawless , James Sexton , Robert Walkup

IPC分类号： G06F1/16

CPC分类号： G06F9/5066

摘要： A general computer-implement method and apparatus to optimize problem layout on a massively parallel supercomputer is described. The method takes as input the communication matrix of an arbitrary problem in the form of an array whose entries C(i, j) are the amount to data communicated from domain i to domain j. Given C(i, j), first implement a heuristic map is implemented which attempts sequentially to map a domain and its communications neighbors either to the same supercomputer node or to near-neighbor nodes on the supercomputer torus while keeping the number of domains mapped to a supercomputer node constant (as much as possible). Next a Markov Chain of maps is generated from the initial map using Monte Carlo simulation with Free Energy (cost function) F=Σi,jC(i,j)H(i,j)—where H(i,j) is the smallest number of hops on the supercomputer torus between domain i and domain j. On the cases tested, found was that the method produces good mappings and has the potential to be used as a general layout optimization tool for parallel codes. At the moment, the serial code implemented to test the method is un-optimized so that computation time to find the optimum map can be several hours on a typical PC. For production implementation, good parallel code for our algorithm would be required which could itself be implemented on supercomputer.

摘要翻译： 描述了在大型并行超级计算机上优化问题布局的通用计算机实现方法和装置。该方法采用数组形式的任意问题的通信矩阵作为输入，其条目C（i，j）是从域i到域j传送的数据量。给定C（i，j），首先实现启发式映射，其尝试顺序地将域及其通信邻居映射到超级计算机节点或超级计算机环面上的近邻节点，同时保持域的数量映射到超级计算机节点常数（尽可能多）。接下来，使用具有自由能量（成本函数）的蒙特卡罗模拟，从初始映射生成马尔可夫链映射，其中F =Σi，j C（i，j）H（i，j） H（i，j）是域i和域j之间的超级计算机环面上的最小跳数。在测试的情况下，发现该方法产生良好的映射，并且有可能被用作并行代码的通用布局优化工具。此时，实现测试方法的序列号未优化，以便在典型的PC上找到最佳映射的计算时间可以为几个小时。对于生产实现，将需要我们的算法的良好的并行代码，这本身可以在超级计算机上实现。

53.

发明授权
Cache as point of coherence in multiprocessor system 有权
标题翻译：缓存作为多处理器系统中的一致性点

公开(公告)号：US09507647B2

公开(公告)日：2016-11-29

申请号：US13008531

申请日：2011-01-18

申请人： Matthias A. Blumrich , Luis H. Ceze , Dong Chen , Alan Gara , Phlip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Xiaotong Zhuang

发明人： Matthias A. Blumrich , Luis H. Ceze , Dong Chen , Alan Gara , Phlip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Xiaotong Zhuang

IPC分类号： G06F12/00 , G06F13/00 , G06F13/28 , G06F9/52 , G06F12/08

CPC分类号： G06F9/524 , G06F12/08

摘要： In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.

摘要翻译： 在多处理器系统中，在L2高速缓冲存储器中实现冲突检查机制。不同版本的推测性写入以不同的方式保存在缓存中。高速缓存目录中保留了推测性写入记录。冲突检查作为目录查找的一部分发生。不冲突的推测版本以不同的缓存方式聚合成聚合版本。推测内存访问请求不会转到主内存。

54.

发明授权
Cache directory lookup reader set encoding for partial cache line speculation support 有权
标题翻译：缓存目录查找阅读器集编码为部分缓存线投机支持

公开(公告)号：US08868837B2

公开(公告)日：2014-10-21

申请号：US13008602

申请日：2011-01-18

申请人： Alan Gara , Martin Ohmacht

发明人： Alan Gara , Martin Ohmacht

IPC分类号： G06F12/00 , G06F13/00 , G06F13/28 , G06F12/08

CPC分类号： G06F9/524 , G06F12/08

摘要： In a multiprocessor system, with conflict checking implemented in a directory lookup of a shared cache memory, a reader set encoding permits dynamic recordation of read accesses. The reader set encoding includes an indication of a portion of a line read, for instance by indicating boundaries of read accesses. Different encodings may apply to different types of speculative execution.

摘要翻译： 在多处理器系统中，通过在共享高速缓冲存储器的目录查找中实现冲突检查，读取器集合编码允许读取访问的动态记录。读取器组编码包括例如通过指示读取访问的边界来读取行的一部分的指示。不同的编码可能适用于不同类型的投机执行。

55.

发明授权
Optimizing TLB entries for mixed page size storage in contiguous memory 有权
标题翻译：优化连续内存中混合页大小存储的TLB条目

公开(公告)号：US08856490B2

公开(公告)日：2014-10-07

申请号：US13618730

申请日：2012-09-14

申请人： Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Jon K. Kriegel , Martin Ohmacht , Burkhard Steinmacher-Burow

发明人： Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Jon K. Kriegel , Martin Ohmacht , Burkhard Steinmacher-Burow

IPC分类号： G06F12/06 , G06F12/10

CPC分类号： G06F12/1027 , G06F2212/652 , G06F2212/654

摘要： A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.

摘要翻译： 提供了一种访问存储器的系统和方法。该系统包括用于存储一个或多个页表条目的查找缓冲器，其中所述一个或多个页表条目中的每一个包括至少虚拟页码和物理页号; 用于从所述处理器接收虚拟地址的逻辑电路，所述逻辑电路用于将所述虚拟地址与所述页表项之一中的虚拟页号进行匹配，以选择所述同一页表项中的所述物理页号，所述页表项具有一个或多个位被设置为从页面排除存储器范围。

56.

发明授权
List based prefetch 有权
标题翻译：基于列表的预取

公开(公告)号：US08806141B2

公开(公告)日：2014-08-12

申请号：US13593838

申请日：2012-08-24

申请人： Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam

发明人： Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam

IPC分类号： G06F12/00 , G06F12/08

CPC分类号： G06F12/0862

摘要： A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.

摘要翻译： 列表预取引擎提高并行计算系统的性能。列表预取引擎接收当前高速缓存未命中地址。列表预取引擎评估当前缓存未命中地址是否有效。如果当前高速缓存未命中地址有效，则列表预取引擎将比较当前高速缓存未命中地址和列表地址。列表地址表示列表中的地址。列表描述了先前高速缓存未命中地址的任意序列。如果当前缓存未命中地址和列表地址之间存在匹配，则预取引擎将根据列表预取数据。

57.

发明授权
Efficiency of static core turn-off in a system-on-a-chip with variation 失效
标题翻译：在具有变化的片上系统中静态磁芯关断的效率

公开(公告)号：US08571847B2

公开(公告)日：2013-10-29

申请号：US12727984

申请日：2010-03-19

申请人： Chen-Yong Cher , Paul W. Coteus , Alan Gara , Eren Kursun , David P. Paulsen , Brian A. Schuelke , John E. Sheets, II , Shurong Tian

发明人： Chen-Yong Cher , Paul W. Coteus , Alan Gara , Eren Kursun , David P. Paulsen , Brian A. Schuelke , John E. Sheets, II , Shurong Tian

IPC分类号： G06G7/75

CPC分类号： G06F1/3203 , G06F1/206 , G06F1/3237 , G06F11/24 , Y02D10/128 , Y02D10/16

摘要： A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.

摘要翻译： 一种用于提高多核处理器中的静态核心关断的效率的处理器实现的方法，所述方法包括：通过模拟在多核处理器的设计处进行多核处理器的关断分析其中所述多核处理器的设计阶段的所述多核处理器的关断分析包括对应于第一多核处理器核的第一输出关闭; 在多核处理器的测试阶段对多核处理器进行关断分析，其中多核处理器的测试阶段的多核处理器的关断分析包括对应于第二多核处理器的第二多输出核心处理器核心关闭; 比较第一输出和第二输出以确定第一输出是否指相同的磁芯作为第二输出关闭; 如果第一输出和第二输出均指向相同的核来关闭，则输出对应于第一多核处理器核心的第三输出。

58.

发明授权
Ordering of guarded and unguarded stores for no-sync I/O 失效
标题翻译：为不同步I / O订购防护和无保护的存储

公开(公告)号：US08473683B2

公开(公告)日：2013-06-25

申请号：US12986349

申请日：2011-01-07

申请人： Alan Gara , Martin Ohmacht

发明人： Alan Gara , Martin Ohmacht

IPC分类号： G06F12/12

CPC分类号： G06F12/0811 , G06F9/30087 , G06F9/3834 , G06F9/3842 , G06F12/0808

摘要： A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.

摘要翻译： 并行计算系统处理至少一个存储指令。第一个处理器核心发出存储指令。与第一处理器核心相关联的第一个队列存储存储指令。与第一处理器核心的第一本地高速缓冲存储器设备相关联的第二队列存储存储指令。第一处理器核心根据存储指令来更新第一本地高速缓冲存储器设备中的第一数据。与至少一个共享高速缓冲存储器设备相关联的第三队列存储存储指令。第一处理器核心使与存储指令相关联的第二数据在至少一个共享高速缓冲存储器中无效。第一个处理器核心将与存储指令相关联的第三个数据与其他处理器内核的其他本地缓存存储器设备无效。第一个处理器核心只冲刷第一个队列。

59.

发明申请
COLLECTIVE NETWORK FOR COMPUTER STRUCTURES 有权
标题翻译：电脑结构的集体网络

公开(公告)号：US20110219280A1

公开(公告)日：2011-09-08

申请号：US13101566

申请日：2011-05-05

申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

IPC分类号： H03M13/09 , H04L1/08 , G06F11/10 , G06F11/14

CPC分类号： H04L1/08 , G06F9/46 , G06F11/08 , G06F11/1423 , H03M13/09 , H04L1/0061 , H04L1/1607 , H04L1/1867 , H04L2001/0093 , H04L2001/0097

摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.

摘要翻译： 一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。包括通过链路互连网络节点的路由器设备，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

60.

发明申请
READER SET ENCODING FOR DIRECTORY OF SHARED CACHE MEMORY IN MULTIPROCESSOR SYSTEM 失效
标题翻译：在多处理器系统中编写共享高速缓存存储器的目录的读写器集

公开(公告)号：US20110219191A1

公开(公告)日：2011-09-08

申请号：US13008583

申请日：2011-01-18

申请人： Daniel Ahn , Luis H. Ceze , Alan Gara , Martin Ohmacht , Zhuang Xiaotong

发明人： Daniel Ahn , Luis H. Ceze , Alan Gara , Martin Ohmacht , Zhuang Xiaotong

IPC分类号： G06F12/08

CPC分类号： G06F9/524 , G06F12/08

摘要： In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.

摘要翻译： 在具有推测性执行的并行处理系统中，冲突检查发生在所有处理器共享的高速缓冲存储器的目录查找中。在每种情况下，相同的物理内存地址将映射到同一组缓存，无论哪个处理器发起该访问。该目录包括一个动态阅读器集编码，指示什么推测线程读取了一条特定的行。这种读写器编码用于冲突检查。位组编码用于指定已读取行的特定线程。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类