Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator

    公开(公告)号:US10915471B2

    公开(公告)日:2021-02-09

    申请号:US16370928

    申请日:2019-03-30

    Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.

    Update mask for handling interaction between fills and updates
    2.
    发明授权
    Update mask for handling interaction between fills and updates 有权
    更新掩码以处理填充和更新之间的交互

    公开(公告)号:US09251073B2

    公开(公告)日:2016-02-02

    申请号:US13732242

    申请日:2012-12-31

    CPC classification number: G06F12/0828 G06F12/0811 G06F12/0886

    Abstract: A multi core processor implements a cash coherency protocol in which probe messages are address-ordered on a probe channel while responses are un-ordered on a response channel. When a first core generates a read of an address that misses in the first core's cache, a line fill is initiated. If a second core is writing the same address, the second core generates an update on the addressed ordered probe channel. The second core's update may arrive before or after the first core's line fill returns. If the update arrived before the fill returned, a mask is maintained to indicate which portions of the line were modified by the update so that the late arriving line fill only modifies portions of the line that were unaffected by the earlier-arriving update.

    Abstract translation: 多核处理器实现现金一致性协议,其中探测消息在探测信道上被地址排序,而响应在响应信道上被排序。 当第一个内核生成对第一个内核的高速缓存中丢失的地址的读取时,将启动行填充。 如果第二个核心正在写入相同的地址,则第二个核心将在寻址的有序探测通道上生成更新。 第二个核心的更新可能在第一个核心线填充返回之前或之后到达。 如果更新在填充返回之前到达,则保留掩码以指示线的哪些部分被更新修改,使得迟到的行填充仅修改不受较早到达更新影响的行的部分。

    APPARATUSES, METHODS, AND SYSTEMS FOR MEMORY INTERFACE CIRCUIT ALLOCATION IN A CONFIGURABLE SPATIAL ACCELERATOR

    公开(公告)号:US20200310994A1

    公开(公告)日:2020-10-01

    申请号:US16370928

    申请日:2019-03-30

    Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.

    Processors and methods for privileged configuration in a spatial array

    公开(公告)号:US10445098B2

    公开(公告)日:2019-10-15

    申请号:US15721809

    申请日:2017-09-30

    Abstract: Methods and apparatuses relating to privileged configuration in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; and a configuration controller coupled to a first subset and a second, different subset of the plurality of processing elements, the first subset having an output coupled to an input of the second, different subset, wherein the configuration controller is to configure the interconnect network between the first subset and the second, different subset of the plurality of processing elements to not allow communication on the interconnect network between the first subset and the second, different subset when a privilege bit is set to a first value and to allow communication on the interconnect network between the first subset and the second, different subset of the plurality of processing elements when the privilege bit is set to a second value.

    Memory circuits and methods for distributed memory hazard detection and error recovery

    公开(公告)号:US10515049B1

    公开(公告)日:2019-12-24

    申请号:US15640541

    申请日:2017-07-01

    Abstract: Methods and apparatuses relating to distributed memory hazard detection and error recovery are described. In one embodiment, a memory circuit includes a memory interface circuit to service memory requests from a spatial array of processing elements for data stored in a plurality of cache banks; and a hazard detection circuit in each of the plurality of cache banks, wherein a first hazard detection circuit for a speculative memory load request from the memory interface circuit, that is marked with a potential dynamic data dependency, to an address within a first cache bank of the first hazard detection circuit, is to mark the address for tracking of other memory requests to the address, store data from the address in speculative completion storage, and send the data from the speculative completion storage to the spatial array of processing elements when a memory dependency token is received for the speculative memory load request.

    Apparatus, methods, and systems with a configurable spatial accelerator

    公开(公告)号:US10445250B2

    公开(公告)日:2019-10-15

    申请号:US15859454

    申请日:2017-12-30

    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.

    Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator

    公开(公告)号:US10380063B2

    公开(公告)日:2019-08-13

    申请号:US15721802

    申请日:2017-09-30

    Abstract: Systems, methods, and apparatuses relating to a sequencer dataflow operator of a configurable spatial accelerator are described. In one embodiment, an interconnect network between a plurality of processing elements receives an input of a dataflow graph comprising a plurality of nodes forming a loop construct, wherein the dataflow graph is overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements and at least one dataflow operator controlled by a sequencer dataflow operator of the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements and the sequencer dataflow operator generates control signals for the at least one dataflow operator in the plurality of processing elements.

    APPARATUS, METHODS, AND SYSTEMS WITH A CONFIGURABLE SPATIAL ACCELERATOR

    公开(公告)号:US20190205263A1

    公开(公告)日:2019-07-04

    申请号:US15859454

    申请日:2017-12-30

    CPC classification number: G06F12/1054 G06F2212/608 G06F2212/683

    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.

Patent Agency Ranking