Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator

    公开(公告)号:US10417175B2

    公开(公告)日:2019-09-17

    申请号:US15859466

    申请日:2017-12-30

    Abstract: Methods and apparatuses relating to consistency in an accelerator are described. In one embodiment, request address file (RAF) circuits are coupled to a spatial array by a first network, a memory is coupled to the RAF circuits by a second network, a RAF circuit is to not issue, into the second network, a request to the memory marked with a program order dependency on a previous request until receiving a first token generated by completion of the previous request to the memory by another RAF circuit, and a second RAF circuit is to not issue, into the second network, a second request to the memory marked with a program order dependency on a first request until receiving a second token sent by a first RAF circuit when a predetermined time period has lapsed since the first request was issued by the first RAF circuit into the second network.

    Loop execution with predicate computing for dataflow machines

    公开(公告)号:US10346145B2

    公开(公告)日:2019-07-09

    申请号:US15632123

    申请日:2017-06-23

    Abstract: Compilers for compiling computer programs and apparatuses including compilers are disclosed herein. A compiler may include one or more analyzers to parse and analyze source instructions of a computer program including identification of nested loops of the computer program. The compiler may also include a code generator coupled to the one or more analyzers to generate and output executable code for the computer program that executes on a data flow machine, including a data flow graph, based at least in part on results of the analysis. In embodiments, the executable code may include executable code that recursively computes predicates of identified nested loops for use to generate control signal for the data flow graph to allow execution of each loop to start when the loop's predicate is available, independent of whether any other loop is in execution or not. Other embodiments may be disclosed or claimed.

    Processors, methods, and systems with a configurable spatial accelerator

    公开(公告)号:US10558575B2

    公开(公告)日:2020-02-11

    申请号:US15396402

    申请日:2016-12-30

    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.

    Runtime address disambiguation in acceleration hardware

    公开(公告)号:US10474375B2

    公开(公告)日:2019-11-12

    申请号:US15396049

    申请日:2016-12-30

    Abstract: An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address of the memory. The set of input queues also includes a completion queue to buffer response data for a load operation. A disambiguator circuit, coupled to the set of input queues and the memory, is to, responsive to determining the load operation, which succeeds the store operation, has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation.

    PROCESSORS, METHODS, AND SYSTEMS FOR A CONFIGURABLE SPATIAL ACCELERATOR WITH SECURITY, POWER REDUCTION, AND PERFORMACE FEATURES

    公开(公告)号:US20190004878A1

    公开(公告)日:2019-01-03

    申请号:US15640542

    申请日:2017-07-01

    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of two dataflow graphs each comprising a plurality of nodes, wherein a first dataflow graph and a second dataflow graph are be overlaid into a first and second portion, respectively, of the interconnect network and a first and second subset, respectively, of the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the first and second subsets of the plurality of processing elements are to perform a first and second operation, respectively, when incoming first and second, respectively, operand sets arrive at the plurality of processing elements.

    APPARATUSES, METHODS, AND SYSTEMS FOR SWIZZLE OPERATIONS IN A CONFIGURABLE SPATIAL ACCELERATOR

    公开(公告)号:US20200310797A1

    公开(公告)日:2020-10-01

    申请号:US16370915

    申请日:2019-03-30

    Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA. In one embodiment, a CSA includes a plurality of processing elements, a circuit switched interconnect network between the plurality of processing elements, and a configuration register within each processing element to store a configuration value having a first portion that, when set to a first value that indicates a first mode, causes the processing element to pass an input value to operation circuitry of the processing element without modifying the input value, and, when set to a second value that indicates a second mode, causes the processing element to perform a swizzle operation on the input value to form a swizzled input value before sending the swizzled input value to the operation circuitry of the processing element, and a second portion that causes the processing element to perform an operation indicated by the second portion the configuration value on the input value in the first mode and the swizzled input value in the second mode with the operation circuitry.

    Memory ordering in acceleration hardware

    公开(公告)号:US10572376B2

    公开(公告)日:2020-02-25

    申请号:US15396038

    申请日:2016-12-30

    Abstract: An integrated circuit includes a memory interface, coupled to a memory to store data corresponding to instructions, and an operations queue to buffer memory operations corresponding to the instructions. The integrated circuit may include acceleration hardware to execute a sub-program corresponding to the instructions. A set of input queues may include an address queue to receive, from the acceleration hardware, an address of the memory associated with a second memory operation of the memory operations, and a dependency queue to receive, from the acceleration hardware, a dependency token associated with the address. The dependency token indicates a dependency on data generated by a first memory operation of the memory operations. A scheduler circuit may schedule issuance of the second memory operation to the memory in response to the dependency queue receiving the dependency token and the address queue receiving the address.

Patent Agency Ranking