PROCESSORS, METHODS, AND SYSTEMS FOR A CONFIGURABLE SPATIAL ACCELERATOR WITH SECURITY, POWER REDUCTION, AND PERFORMACE FEATURES

    公开(公告)号:US20190004878A1

    公开(公告)日:2019-01-03

    申请号:US15640542

    申请日:2017-07-01

    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of two dataflow graphs each comprising a plurality of nodes, wherein a first dataflow graph and a second dataflow graph are be overlaid into a first and second portion, respectively, of the interconnect network and a first and second subset, respectively, of the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the first and second subsets of the plurality of processing elements are to perform a first and second operation, respectively, when incoming first and second, respectively, operand sets arrive at the plurality of processing elements.

    SWITCHABLE TOPOLOGY MACHINE
    2.
    发明申请

    公开(公告)号:US20180113838A1

    公开(公告)日:2018-04-26

    申请号:US15637581

    申请日:2017-06-29

    CPC classification number: G06F15/17381 G06F9/38 G06F9/3897 G06F15/17343

    Abstract: Embodiments relate to a computational device including multiple processor tiles on a die that may have multiple switchable topologies. A topology of the computational device may include one or more virtual circuits. A virtual circuit may include multiple processor tiles. A processor tile of a virtual circuit of a topology may include a configuration vector to control a connection between the processor tile and a neighboring processor tile. A first topology of the computation device may correspond to a first phase of a computation of a program, and a second topology of the computation device may correspond to a second phase of the computation of the program. Other embodiments may be described and/or claimed.

    PROCESSORS, METHODS, AND SYSTEMS FOR DEBUGGING A CONFIGURABLE SPATIAL ACCELERATOR

    公开(公告)号:US20190095383A1

    公开(公告)日:2019-03-28

    申请号:US15719281

    申请日:2017-09-28

    Abstract: Systems, methods, and apparatuses relating to debugging a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. At least a first of the plurality of processing elements is to enter a halted state in response to being represented as a first of the plurality of dataflow operators.

    MEMORY ORDERING IN ACCELERATION HARDWARE
    6.
    发明申请

    公开(公告)号:US20180188997A1

    公开(公告)日:2018-07-05

    申请号:US15396038

    申请日:2016-12-30

    Abstract: An integrated circuit includes a memory interface, coupled to a memory to store data corresponding to instructions, and an operations queue to buffer memory operations corresponding to the instructions. The integrated circuit may include acceleration hardware to execute a sub-program corresponding to the instructions. A set of input queues may include an address queue to receive, from the acceleration hardware, an address of the memory associated with a second memory operation of the memory operations, and a dependency queue to receive, from the acceleration hardware, a dependency token associated with the address. The dependency token indicates a dependency on data generated by a first memory operation of the memory operations. A scheduler circuit may schedule issuance of the second memory operation to the memory in response to the dependency queue receiving the dependency token and the address queue receiving the address.

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR LOADING A TILE OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20240220323A1

    公开(公告)日:2024-07-04

    申请号:US18149045

    申请日:2022-12-30

    CPC classification number: G06F9/5027 G06F5/012 G06F7/4876

    Abstract: Systems, methods, and apparatuses relating to floating-point support circuitry to implement floating-point operations on a two-dimensional grid of fixed-point processing elements are described. In one example, a hardware processor includes a two-dimensional grid of fixed-point processing elements; floating-point support circuitry coupled to the two-dimensional grid of fixed-point processing elements; storage for a first, a second, and a destination two-dimensional floating-point matrices coupled to the floating-point support circuitry; and controller circuitry to cause the two-dimensional grid of fixed-point processing elements and the floating-point support circuitry to: determine, by the floating-point support circuitry, an extreme exponent for each row of the first two-dimensional floating-point matrix and for each column of the second two-dimensional floating-point matrix, generate, by the floating-point support circuitry, a first fixed-point matrix from the first two-dimensional floating-point matrix and a second fixed-point matrix from the second two-dimensional floating-point matrix, generate, by the two-dimensional grid of fixed-point processing elements, corresponding fixed-point results by a multiplication of fixed-point elements of the first fixed-point matrix by corresponding fixed-point elements of the second fixed-point matrix, scale, by the floating-point support circuitry, the corresponding fixed-point results according to the extreme exponents to generate scaled fixed-point results, generate, by the floating-point support circuitry, a resultant floating-point matrix from the scaled fixed-point results, and store the resultant floating-point matrix into the destination two-dimensional floating-point matrix.

    LAYERED SUPER-RETICLE COMPUTING : ARCHITECTURES AND METHODS

    公开(公告)号:US20210255674A1

    公开(公告)日:2021-08-19

    申请号:US17174106

    申请日:2021-02-11

    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.

    LAYERED SUPER-RETICLE COMPUTING : ARCHITECTURES AND METHODS

    公开(公告)号:US20190354146A1

    公开(公告)日:2019-11-21

    申请号:US16416753

    申请日:2019-05-20

    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.

Patent Agency Ranking