Apparatus, methods, and systems for multicast in a configurable spatial accelerator

    公开(公告)号:US10565134B2

    公开(公告)日:2020-02-18

    申请号:US15859473

    申请日:2017-12-30

    Abstract: Systems, methods, and apparatuses relating to multicast in a configurable spatial accelerator are described. In one embodiment, an accelerator includes a first output buffer of a first processing element coupled to a first input buffer of a second processing element and a second input buffer of a third processing element; and the first processing element determines that it was able to complete a transmission in a previous cycle when the first processing element observed for both the second processing element and the third processing element that either a speculation value was set to a value to indicate a dataflow token was stored in its input buffer (e.g., as indicated by a reception value (e.g., bit)) or a backpressure value was set to a value to indicate that storage is to be available in its input buffer before dequeuing the dataflow token from the first output buffer.

    Apparatuses, methods, and systems for conditional operations in a configurable spatial accelerator

    公开(公告)号:US10853073B2

    公开(公告)日:2020-12-01

    申请号:US16024849

    申请日:2018-06-30

    Abstract: Systems, methods, and apparatuses relating to conditional operations in a configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes an output buffer of a first processing element coupled to an input buffer of a second processing element via a first data path that is to send a first dataflow token from the output buffer of the first processing element to the input buffer of the second processing element when the first dataflow token is received in the output buffer of the first processing element; an output buffer of a third processing element coupled to the input buffer of the second processing element via a second data path that is to send a second dataflow token from the output buffer of the third processing element to the input buffer of the second processing element when the second dataflow token is received in the output buffer of the third processing element; a first backpressure path from the input buffer of the second processing element to the first processing element to indicate to the first processing element when storage is not available in the input buffer of the second processing element; a second backpressure path from the input buffer of the second processing element to the third processing element to indicate to the third processing element when storage is not available in the input buffer of the second processing element; and a scheduler of the second processing element to cause storage of the first dataflow token from the first data path into the input buffer of the second processing element when both the first backpressure path indicates storage is available in the input buffer of the second processing element and a conditional token received in a conditional queue of the second processing element from another processing element is a first value.

    Apparatus, methods, and systems for low latency communication in a configurable spatial accelerator

    公开(公告)号:US10891240B2

    公开(公告)日:2021-01-12

    申请号:US16024802

    申请日:2018-06-30

    Abstract: Systems, methods, and apparatuses relating to low latency communications in a configurable spatial accelerator are described. In one embodiment, a processor includes a spatial array of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, a plurality of request address file circuits coupled to the spatial array of processing elements and a cache memory, each request address file circuit of the plurality of request address file circuits to access data in the cache memory in response to a request for data access from the spatial array of processing elements, a plurality of translation lookaside buffers comprising a translation lookaside buffer in each of the plurality of request address file circuits to provide an output of a physical address for an input of a virtual address, and a function controller to receive an interrupt that includes a first field, that when set to a first value, causes a shootdown message to be broadcast to the plurality of translation lookaside buffers to cause a shootdown in the plurality of translation lookaside buffers.

    Low energy consumption mantissa multiplication for floating point multiply-add operations

    公开(公告)号:US10402168B2

    公开(公告)日:2019-09-03

    申请号:US15283295

    申请日:2016-10-01

    Abstract: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.

    Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator

    公开(公告)号:US10564980B2

    公开(公告)日:2020-02-18

    申请号:US15944761

    申请日:2018-04-03

    Abstract: Systems, methods, and apparatuses relating to conditional queues in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first output buffer of a first processing element coupled to a first input buffer of a second processing element and a second input buffer of a third processing element via a data path that is to send a dataflow token to the first input buffer of the second processing element and the second input buffer of the third processing element when the dataflow token is received in the first output buffer of the first processing element; a first backpressure path from the first input buffer of the second processing element to the first processing element to indicate to the first processing element when storage is not available in the first input buffer of the second processing element; a second backpressure path from the second input buffer of the third processing element to the first processing element to indicate to the first processing element when storage is not available in the second input buffer of the third processing element; and a scheduler of the second processing element to cause storage of the dataflow token from the data path into the first input buffer of the second processing element when both the first backpressure path indicates storage is available in the first input buffer of the second processing element and a conditional token received in a conditional queue of the second processing element from another processing element is a true conditional token.

    Processors, methods, and systems with a configurable spatial accelerator

    公开(公告)号:US10558575B2

    公开(公告)日:2020-02-11

    申请号:US15396402

    申请日:2016-12-30

    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.

    Apparatuses, methods, and systems for integrated control and data processing in a configurable spatial accelerator

    公开(公告)号:US10459866B1

    公开(公告)日:2019-10-29

    申请号:US16024801

    申请日:2018-06-30

    Abstract: Systems, methods, and apparatuses relating to integrated control and data processing in a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; a network between the plurality of processing elements to transfer values between the plurality of processing elements; and a first processing element of the plurality of processing elements including a first plurality of input queues having a first width coupled to the network, a second plurality of input queues having a second, larger width coupled to the network, at least one first output queue having the first width coupled to the network, at least one second output queue having the second, larger width coupled to the network, a first operation circuitry coupled to the first plurality of input queues having the first width, a second operation circuitry coupled to the second plurality of input queues having the second, larger width, and a configuration register within the first processing element to store a configuration value that causes the first operation circuitry to perform a second operation on values from the first plurality of input queues to create a first resultant value, and when the first resultant value is a first value, the second operation circuitry is to perform a third operation on values from the second plurality of input queues to create a second resultant value and store the second resultant value in the at least one second output queue.

Patent Agency Ranking