-
1.
公开(公告)号:US10417175B2
公开(公告)日:2019-09-17
申请号:US15859466
申请日:2017-12-30
Applicant: Intel Corporation
Inventor: Kermin E. Fleming , Simon C. Steely, Jr. , Kent D. Glossop
IPC: G06F15/16 , G06F15/173 , G06F9/54
Abstract: Methods and apparatuses relating to consistency in an accelerator are described. In one embodiment, request address file (RAF) circuits are coupled to a spatial array by a first network, a memory is coupled to the RAF circuits by a second network, a RAF circuit is to not issue, into the second network, a request to the memory marked with a program order dependency on a previous request until receiving a first token generated by completion of the previous request to the memory by another RAF circuit, and a second RAF circuit is to not issue, into the second network, a second request to the memory marked with a program order dependency on a first request until receiving a second token sent by a first RAF circuit when a predetermined time period has lapsed since the first request was issued by the first RAF circuit into the second network.
-
公开(公告)号:US10387319B2
公开(公告)日:2019-08-20
申请号:US15640534
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Michael C. Adler , Chiachen Chou , Neal C. Crago , Kermin Fleming , Kent D. Glossop , Aamer Jaleel , Pratik M. Marolia , Simon C. Steely, Jr. , Samantika S. Sury
IPC: G06F12/0802 , G06F15/00 , G06F12/0862 , H03K19/177 , G06F15/78 , G11C8/12 , G06F17/50 , G06F15/80
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.
-
公开(公告)号:US10346145B2
公开(公告)日:2019-07-09
申请号:US15632123
申请日:2017-06-23
Applicant: INTEL CORPORATION
Inventor: Yongzhi Zhang , Kent D. Glossop
Abstract: Compilers for compiling computer programs and apparatuses including compilers are disclosed herein. A compiler may include one or more analyzers to parse and analyze source instructions of a computer program including identification of nested loops of the computer program. The compiler may also include a code generator coupled to the one or more analyzers to generate and output executable code for the computer program that executes on a data flow machine, including a data flow graph, based at least in part on results of the analysis. In embodiments, the executable code may include executable code that recursively computes predicates of identified nested loops for use to generate control signal for the data flow graph to allow execution of each loop to start when the loop's predicate is available, independent of whether any other loop is in execution or not. Other embodiments may be disclosed or claimed.
-
公开(公告)号:US11593295B2
公开(公告)日:2023-02-28
申请号:US17550875
申请日:2021-12-14
Applicant: Intel Corporation
Inventor: Kermin E. Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop , Mitchell Diamond , Benjamin Keen , Dennis Bradford , Fabrizio Petrini , Barry Tannenbaum , Yongzhi Zhang
Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.
-
公开(公告)号:US10558575B2
公开(公告)日:2020-02-11
申请号:US15396402
申请日:2016-12-30
Applicant: INTEL CORPORATION
Inventor: Kermin E. Fleming, Jr. , Kent D. Glossop , Simon C. Steely, Jr. , Jinjie Tang , Alan G. Gara
IPC: G06F12/08 , G06F9/30 , G06F9/38 , G06F12/0862 , G06F12/0842 , G06F12/0875
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.
-
公开(公告)号:US10474375B2
公开(公告)日:2019-11-12
申请号:US15396049
申请日:2016-12-30
Applicant: INTEL CORPORATION
Inventor: Kermin Elliott Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop
Abstract: An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address of the memory. The set of input queues also includes a completion queue to buffer response data for a load operation. A disambiguator circuit, coupled to the set of input queues and the memory, is to, responsive to determining the load operation, which succeeds the store operation, has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation.
-
公开(公告)号:US10469397B2
公开(公告)日:2019-11-05
申请号:US15640540
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Kermin Fleming , Kent D. Glossop , Simon C. Steely, Jr.
IPC: H04L12/721 , H04L12/801 , H04L12/863 , H04L12/935 , H04L12/937
Abstract: Systems, methods, and apparatuses relating to configurable network-based dataflow operator circuits are described. In one embodiment, a processor includes a spatial array of processing elements, and a packet switched communications network to route data within the spatial array between processing elements according to a dataflow graph to perform a first dataflow operation of the dataflow graph, wherein the packet switched communications network further comprises a plurality of network dataflow endpoint circuits to perform a second dataflow operation of the dataflow graph.
-
公开(公告)号:US20190004878A1
公开(公告)日:2019-01-03
申请号:US15640542
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Michael C. Adler , Kermin Fleming , Kent D. Glossop , Simon C. Steely, JR.
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of two dataflow graphs each comprising a plurality of nodes, wherein a first dataflow graph and a second dataflow graph are be overlaid into a first and second portion, respectively, of the interconnect network and a first and second subset, respectively, of the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the first and second subsets of the plurality of processing elements are to perform a first and second operation, respectively, when incoming first and second, respectively, operand sets arrive at the plurality of processing elements.
-
9.
公开(公告)号:US20200310797A1
公开(公告)日:2020-10-01
申请号:US16370915
申请日:2019-03-30
Applicant: Intel Corporation
Inventor: Jesus Corbal , Rohan Sharma , Simon Steely, JR. , Chinmay Ashok , Kent D. Glossop , Dennis Bradford , Paul Caprioli , Louise Huot , Kermin ChoFleming , Barry Tannenbaum
Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA. In one embodiment, a CSA includes a plurality of processing elements, a circuit switched interconnect network between the plurality of processing elements, and a configuration register within each processing element to store a configuration value having a first portion that, when set to a first value that indicates a first mode, causes the processing element to pass an input value to operation circuitry of the processing element without modifying the input value, and, when set to a second value that indicates a second mode, causes the processing element to perform a swizzle operation on the input value to form a swizzled input value before sending the swizzled input value to the operation circuitry of the processing element, and a second portion that causes the processing element to perform an operation indicated by the second portion the configuration value on the input value in the first mode and the swizzled input value in the second mode with the operation circuitry.
-
公开(公告)号:US10572376B2
公开(公告)日:2020-02-25
申请号:US15396038
申请日:2016-12-30
Applicant: INTEL CORPORATION
Inventor: Kermin Elliott Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop
Abstract: An integrated circuit includes a memory interface, coupled to a memory to store data corresponding to instructions, and an operations queue to buffer memory operations corresponding to the instructions. The integrated circuit may include acceleration hardware to execute a sub-program corresponding to the instructions. A set of input queues may include an address queue to receive, from the acceleration hardware, an address of the memory associated with a second memory operation of the memory operations, and a dependency queue to receive, from the acceleration hardware, a dependency token associated with the address. The dependency token indicates a dependency on data generated by a first memory operation of the memory operations. A scheduler circuit may schedule issuance of the second memory operation to the memory in response to the dependency queue receiving the dependency token and the address queue receiving the address.
-
-
-
-
-
-
-
-
-