-
公开(公告)号:US10565134B2
公开(公告)日:2020-02-18
申请号:US15859473
申请日:2017-12-30
Applicant: Intel Corporation
Inventor: Kermin E. Fleming, Jr. , Ping Zou , Mitchell Diamond
IPC: G06F13/16
Abstract: Systems, methods, and apparatuses relating to multicast in a configurable spatial accelerator are described. In one embodiment, an accelerator includes a first output buffer of a first processing element coupled to a first input buffer of a second processing element and a second input buffer of a third processing element; and the first processing element determines that it was able to complete a transmission in a previous cycle when the first processing element observed for both the second processing element and the third processing element that either a speculation value was set to a value to indicate a dataflow token was stored in its input buffer (e.g., as indicated by a reception value (e.g., bit)) or a backpressure value was set to a value to indicate that storage is to be available in its input buffer before dequeuing the dataflow token from the first output buffer.
-
公开(公告)号:US11200186B2
公开(公告)日:2021-12-14
申请号:US16024854
申请日:2018-06-30
Applicant: Intel Corporation
Inventor: Kermin E. Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop , Mitchell Diamond , Benjamin Keen , Dennis Bradford , Fabrizio Petrini , Barry Tannenbaum , Yongzhi Zhang
Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.
-
3.
公开(公告)号:US10853073B2
公开(公告)日:2020-12-01
申请号:US16024849
申请日:2018-06-30
Applicant: Intel Corporation
Inventor: Kermin E. Fleming, Jr. , Ping Zou , Mitchell Diamond , Benjamin Keen
Abstract: Systems, methods, and apparatuses relating to conditional operations in a configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes an output buffer of a first processing element coupled to an input buffer of a second processing element via a first data path that is to send a first dataflow token from the output buffer of the first processing element to the input buffer of the second processing element when the first dataflow token is received in the output buffer of the first processing element; an output buffer of a third processing element coupled to the input buffer of the second processing element via a second data path that is to send a second dataflow token from the output buffer of the third processing element to the input buffer of the second processing element when the second dataflow token is received in the output buffer of the third processing element; a first backpressure path from the input buffer of the second processing element to the first processing element to indicate to the first processing element when storage is not available in the input buffer of the second processing element; a second backpressure path from the input buffer of the second processing element to the third processing element to indicate to the third processing element when storage is not available in the input buffer of the second processing element; and a scheduler of the second processing element to cause storage of the first dataflow token from the first data path into the input buffer of the second processing element when both the first backpressure path indicates storage is available in the input buffer of the second processing element and a conditional token received in a conditional queue of the second processing element from another processing element is a first value.
-
4.
公开(公告)号:US10891240B2
公开(公告)日:2021-01-12
申请号:US16024802
申请日:2018-06-30
Applicant: Intel Corporation
Inventor: Suresh Mathew , Mitchell Diamond , Kermin E. Fleming, Jr.
IPC: G06F12/1027 , G06F3/06
Abstract: Systems, methods, and apparatuses relating to low latency communications in a configurable spatial accelerator are described. In one embodiment, a processor includes a spatial array of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, a plurality of request address file circuits coupled to the spatial array of processing elements and a cache memory, each request address file circuit of the plurality of request address file circuits to access data in the cache memory in response to a request for data access from the spatial array of processing elements, a plurality of translation lookaside buffers comprising a translation lookaside buffer in each of the plurality of request address file circuits to provide an output of a physical address for an input of a virtual address, and a function controller to receive an interrupt that includes a first field, that when set to a first value, causes a shootdown message to be broadcast to the plurality of translation lookaside buffers to cause a shootdown in the plurality of translation lookaside buffers.
-
5.
公开(公告)号:US10402168B2
公开(公告)日:2019-09-03
申请号:US15283295
申请日:2016-10-01
Applicant: Intel Corporation
Abstract: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.
-
公开(公告)号:US11593295B2
公开(公告)日:2023-02-28
申请号:US17550875
申请日:2021-12-14
Applicant: Intel Corporation
Inventor: Kermin E. Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop , Mitchell Diamond , Benjamin Keen , Dennis Bradford , Fabrizio Petrini , Barry Tannenbaum , Yongzhi Zhang
Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.
-
7.
公开(公告)号:US10564980B2
公开(公告)日:2020-02-18
申请号:US15944761
申请日:2018-04-03
Applicant: Intel Corporation
Inventor: Kermin E. Fleming, Jr. , Ping Zou , Mitchell Diamond , Benjamin Keen
Abstract: Systems, methods, and apparatuses relating to conditional queues in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first output buffer of a first processing element coupled to a first input buffer of a second processing element and a second input buffer of a third processing element via a data path that is to send a dataflow token to the first input buffer of the second processing element and the second input buffer of the third processing element when the dataflow token is received in the first output buffer of the first processing element; a first backpressure path from the first input buffer of the second processing element to the first processing element to indicate to the first processing element when storage is not available in the first input buffer of the second processing element; a second backpressure path from the second input buffer of the third processing element to the first processing element to indicate to the first processing element when storage is not available in the second input buffer of the third processing element; and a scheduler of the second processing element to cause storage of the dataflow token from the data path into the first input buffer of the second processing element when both the first backpressure path indicates storage is available in the first input buffer of the second processing element and a conditional token received in a conditional queue of the second processing element from another processing element is a true conditional token.
-
公开(公告)号:US10558575B2
公开(公告)日:2020-02-11
申请号:US15396402
申请日:2016-12-30
Applicant: INTEL CORPORATION
Inventor: Kermin E. Fleming, Jr. , Kent D. Glossop , Simon C. Steely, Jr. , Jinjie Tang , Alan G. Gara
IPC: G06F12/08 , G06F9/30 , G06F9/38 , G06F12/0862 , G06F12/0842 , G06F12/0875
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.
-
公开(公告)号:US10459866B1
公开(公告)日:2019-10-29
申请号:US16024801
申请日:2018-06-30
Applicant: Intel Corporation
Inventor: Kermin E. Fleming, Jr. , Mitchell Diamond , Ping Zou , Benjamin Keen
Abstract: Systems, methods, and apparatuses relating to integrated control and data processing in a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; a network between the plurality of processing elements to transfer values between the plurality of processing elements; and a first processing element of the plurality of processing elements including a first plurality of input queues having a first width coupled to the network, a second plurality of input queues having a second, larger width coupled to the network, at least one first output queue having the first width coupled to the network, at least one second output queue having the second, larger width coupled to the network, a first operation circuitry coupled to the first plurality of input queues having the first width, a second operation circuitry coupled to the second plurality of input queues having the second, larger width, and a configuration register within the first processing element to store a configuration value that causes the first operation circuitry to perform a second operation on values from the first plurality of input queues to create a first resultant value, and when the first resultant value is a first value, the second operation circuitry is to perform a third operation on values from the second plurality of input queues to create a second resultant value and store the second resultant value in the at least one second output queue.
-
-
-
-
-
-
-
-