-
公开(公告)号:US09081501B2
公开(公告)日:2015-07-14
申请号:US13004007
申请日:2011-01-10
Applicant: Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
Inventor: Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
IPC: G06F15/173 , G06F9/06 , G06F15/76
CPC classification number: G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14
Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Abstract translation: 具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
-
公开(公告)号:US08806141B2
公开(公告)日:2014-08-12
申请号:US13593838
申请日:2012-08-24
Applicant: Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
Inventor: Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
CPC classification number: G06F12/0862
Abstract: A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.
Abstract translation: 列表预取引擎提高并行计算系统的性能。 列表预取引擎接收当前高速缓存未命中地址。 列表预取引擎评估当前缓存未命中地址是否有效。 如果当前高速缓存未命中地址有效,则列表预取引擎将比较当前高速缓存未命中地址和列表地址。 列表地址表示列表中的地址。 列表描述了先前高速缓存未命中地址的任意序列。 如果当前缓存未命中地址和列表地址之间存在匹配,则预取引擎将根据列表预取数据。
-
公开(公告)号:US20110219208A1
公开(公告)日:2011-09-08
申请号:US13004007
申请日:2011-01-10
Applicant: Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
Inventor: Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
CPC classification number: G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14
Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Abstract translation: 具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
-
公开(公告)号:US20110173397A1
公开(公告)日:2011-07-14
申请号:US12684693
申请日:2010-01-08
Applicant: Peter Boyle , Norman Christ , Alan Gara , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
Inventor: Peter Boyle , Norman Christ , Alan Gara , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
CPC classification number: G06F12/0862 , G06F2212/6026
Abstract: A stream prefetch engine performs data retrieval in a parallel computing system. The engine receives a load request from at least one processor. The engine evaluates whether a first memory address requested in the load request is present and valid in a table. The engine checks whether there exists valid data corresponding to the first memory address in an array if the first memory address is present and valid in the table. The engine increments a prefetching depth of a first stream that the first memory address belongs to and fetching a cache line associated with the first memory address from the at least one cache memory device if there is not yet valid data corresponding to the first memory address in the array. The engine determines whether prefetching of additional data is needed for the first stream within its prefetching depth. The engine prefetches the additional data if the prefetching is needed.
Abstract translation: 流预取引擎在并行计算系统中执行数据检索。 引擎从至少一个处理器接收加载请求。 引擎评估在加载请求中请求的第一个内存地址是否存在,并且在表中有效。 如果第一个存储器地址在表中存在且有效,引擎将检查是否存在与数组中的第一个存储器地址对应的有效数据。 如果还没有对应于第一存储器地址的有效数据,则引擎增加第一存储器地址所属的第一流的预取深度并从至少一个高速缓冲存储器设备获取与第一存储器地址相关联的高速缓存行 阵列。 该引擎确定在其预取深度内的第一个流是否需要预取附加数据。 如果需要预取,引擎将预取附加数据。
-
公开(公告)号:US20110119426A1
公开(公告)日:2011-05-19
申请号:US12696825
申请日:2010-01-29
Applicant: Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
Inventor: Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
CPC classification number: G06F12/0862
Abstract: A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.
Abstract translation: 列表预取引擎提高并行计算系统的性能。 列表预取引擎接收当前高速缓存未命中地址。 列表预取引擎评估当前缓存未命中地址是否有效。 如果当前高速缓存未命中地址有效,则列表预取引擎将比较当前高速缓存未命中地址和列表地址。 列表地址表示列表中的地址。 列表描述了先前高速缓存未命中地址的任意序列。 如果当前缓存未命中地址和列表地址之间存在匹配,则预取引擎将根据列表预取数据。
-
公开(公告)号:US20120324142A1
公开(公告)日:2012-12-20
申请号:US13593838
申请日:2012-08-24
Applicant: Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
Inventor: Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
CPC classification number: G06F12/0862
Abstract: A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.
Abstract translation: 列表预取引擎提高并行计算系统的性能。 列表预取引擎接收当前高速缓存未命中地址。 列表预取引擎评估当前缓存未命中地址是否有效。 如果当前高速缓存未命中地址有效,则列表预取引擎将比较当前高速缓存未命中地址和列表地址。 列表地址表示列表中的地址。 列表描述了先前高速缓存未命中地址的任意序列。 如果当前缓存未命中地址和列表地址之间存在匹配,则预取引擎将根据列表预取数据。
-
公开(公告)号:US08347039B2
公开(公告)日:2013-01-01
申请号:US12684693
申请日:2010-01-08
Applicant: Peter Boyle , Norman Christ , Alan Gara , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
Inventor: Peter Boyle , Norman Christ , Alan Gara , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
IPC: G06F12/08
CPC classification number: G06F12/0862 , G06F2212/6026
Abstract: A stream prefetch engine performs data retrieval in a parallel computing system. The engine receives a load request from at least one processor. The engine evaluates whether a first memory address requested in the load request is present and valid in a table. The engine checks whether there exists valid data corresponding to the first memory address in an array if the first memory address is present and valid in the table. The engine increments a prefetching depth of a first stream that the first memory address belongs to and fetching a cache line associated with the first memory address from the at least one cache memory device if there is not yet valid data corresponding to the first memory address in the array. The engine determines whether prefetching of additional data is needed for the first stream within its prefetching depth. The engine prefetches the additional data if the prefetching is needed.
Abstract translation: 流预取引擎在并行计算系统中执行数据检索。 引擎从至少一个处理器接收加载请求。 引擎评估在加载请求中请求的第一个内存地址是否存在,并且在表中有效。 如果第一个存储器地址在表中存在且有效,引擎将检查是否存在与数组中的第一个存储器地址对应的有效数据。 如果还没有对应于第一存储器地址的有效数据,则引擎增加第一存储器地址所属的第一流的预取深度并从至少一个高速缓冲存储器设备获取与第一存储器地址相关联的高速缓存行 阵列。 该引擎确定在其预取深度内的第一个流是否需要预取附加数据。 如果需要预取,引擎将预取附加数据。
-
公开(公告)号:US08255633B2
公开(公告)日:2012-08-28
申请号:US12696825
申请日:2010-01-29
Applicant: Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
Inventor: Peter Boyle , Norman Christ , Alan Gara , Changhoan Kim , Robert Mawhinney , Martin Ohmacht , Krishnan Sugavanam
CPC classification number: G06F12/0862
Abstract: A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.
Abstract translation: 列表预取引擎提高并行计算系统的性能。 列表预取引擎接收当前高速缓存未命中地址。 列表预取引擎评估当前缓存未命中地址是否有效。 如果当前高速缓存未命中地址有效,则列表预取引擎将比较当前高速缓存未命中地址和列表地址。 列表地址表示列表中的地址。 列表描述了先前高速缓存未命中地址的任意序列。 如果当前缓存未命中地址和列表地址之间存在匹配,则预取引擎将根据列表预取数据。
-
-
-
-
-
-
-