-
公开(公告)号:US11841811B2
公开(公告)日:2023-12-12
申请号:US17479906
申请日:2021-09-20
发明人: Raghu Prabhakar , Matthew Thomas Grimm , Sumti Jairath , Kin Hing Leung , Sitanshu Gupta , Yuan Lin , Luca Boasso
IPC分类号: G06F12/0806 , G06F13/20 , G06F15/78
CPC分类号: G06F13/20 , G06F15/7867
摘要: A reconfigurable processor comprises an array of processing units and an instrumentation network. The array of processing units is configured to execute runtime events to execute an application. The instrumentation network is operatively coupled to the array of processing units. The instrumentation network comprises a control bus configured to form control signal routes in the instrumentation network. The instrumentation network further comprises a plurality of instrumentation counters having inputs and outputs connected to the control bus and to the processing units. Instrumentation counters in the plurality instrumentation units are configurable to consume control signals on the inputs and produce counts of the runtime events on the outputs.
-
公开(公告)号:US11561925B2
公开(公告)日:2023-01-24
申请号:US17476749
申请日:2021-09-16
发明人: Raghu Prabhakar , Nathan Francis Sheeley , Matheen Musaddiq , Scott Layson Burson , Sitanshu Gupta , Sumti Jairath , Pramod Nataraja , Ajit Punj
摘要: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.
-
3.
公开(公告)号:US20220156213A1
公开(公告)日:2022-05-19
申请号:US17589467
申请日:2022-01-31
发明人: Gregory Frederick Grohoski , Sumti Jairath , Mark Luttrell , Raghu Prabhakar , Ram Sivaramakrishnan , Manish K. Shah
摘要: A reconfigurable data processor includes a plurality of configurable units, and a configuration controller. The configuration controller is configured to start execution of a first application graph in a first set of configurable units. Then, concurrently with the execution of the first application graph in the first set of configurable units, the configuration controllers receive a command to load a configuration file into a second set of configurable units and obtain the configuration file. The configuration file contains information to configure the second set of configurable units to execute a second application graph. The configuration file is then loaded into the second set of configurable units and execution of the second application graph is started in the second set of configurable units.
-
公开(公告)号:US20240220325A1
公开(公告)日:2024-07-04
申请号:US18603156
申请日:2024-03-12
发明人: Raghu Prabhakar , Manish K. Shah , Pramod Nataraja , David Brian Jackson , Kin Hing Leung , Ram Sivaramakrishnan , Sumti Jairath , Gregory Frederick Grohoski
CPC分类号: G06F9/5027 , G06F15/80 , G06F2209/506
摘要: A computer system includes an array of reconfigurable processor blocks which execute fragments of a larger data processing operation. An array controller distributes a control signal to the reconfigurable processors in the array and receives control signals for the respective execution fragments. The control signal may include quiesce logic or other control methods to execute the effective execution fragments of the larger data processing operation when individual processors become available.
-
公开(公告)号:US12001936B2
公开(公告)日:2024-06-04
申请号:US17700452
申请日:2022-03-21
发明人: Tejas Nagendra Babu Nama , Ruddhi Chaphekar , Ram Sivaramakrishnan , Raghu Prabhakar , Sumti Jairath , Junjue Wang , Kaizhao Liang , Adi Fuchs , Matheen Musaddiq , Arvind Krishna Sujeeth
IPC分类号: G06N3/04
CPC分类号: G06N3/04
摘要: A processing graph of an application with a sequence of processing nodes is obtained which processes an input and generates an intermediate representation a further intermediate representation, and an output representation of the input at stages in the sequence of processing nodes. Graph metadata is generated that specifies a non-overlapping target tiling configuration for the output representation, an overlapping tiling configuration for the input, an overlapping tiling configuration for the intermediate representation, and a third tiling configuration for the further intermediate representation. The processing graph is modified based on the graph metadata to conform to the parameters specified by the graph metadata. A set of computer instructions is then created to execute the modified processing graph on a target processing system.
-
公开(公告)号:US11816560B2
公开(公告)日:2023-11-14
申请号:US17883407
申请日:2022-08-08
发明人: Zhuo Chen , Sumti Jairath
IPC分类号: G06N3/063 , G06F16/904 , G06F15/78
CPC分类号: G06N3/063 , G06F15/7892 , G06F16/904
摘要: The technology disclosed relates to allocating available physical compute units (PCUs) and/or physical memory units (PMUs) of a reconfigurable data processor to operation units of an operation unit graph for execution thereof. In particular, it relates to selecting, for evaluation, an intermediate stage compute processing time between lower and upper search bounds of a generic stage compute processing time, determining a pipeline number of the PCUs and/or the PMUs required to process the operation unit graph, and iteratively, initializing new lower and upper search bounds of the generic stage compute processing time and selecting, for evaluation in a next iteration, a new intermediate stage compute processing time taking into account whether the pipeline number of the PCUs and/or the PMUs produced for a prior intermediate stage compute processing time in a previous iteration is lower or higher than the available PCUs and/or PMUs.
-
公开(公告)号:US11709664B2
公开(公告)日:2023-07-25
申请号:US16890841
申请日:2020-06-02
发明人: Weiwei Chen , Raghu Prabhakar , David Alan Koeplinger , Sitanshu Gupta , Ruddhi Arun Chaphekar , Ajit Punj , Sumti Jairath
CPC分类号: G06F8/452 , G06F8/41 , G06F15/7867 , G06F15/825
摘要: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
-
8.
公开(公告)号:US11681645B2
公开(公告)日:2023-06-20
申请号:US17589467
申请日:2022-01-31
发明人: Gregory Frederick Grohoski , Sumti Jairath , Mark Luttrell , Raghu Prabhakar , Ram Sivaramakrishnan , Manish K. Shah
CPC分类号: G06F13/4027 , G06F9/45533 , G06F12/10 , G06F13/1668 , G06F15/7839 , G06F15/7882 , G06F2212/657
摘要: A reconfigurable data processor includes a plurality of configurable units, and a configuration controller. The configuration controller is configured to start execution of a first application graph in a first set of configurable units. Then, concurrently with the execution of the first application graph in the first set of configurable units, the configuration controllers receive a command to load a configuration file into a second set of configurable units and obtain the configuration file. The configuration file contains information to configure the second set of configurable units to execute a second application graph. The configuration file is then loaded into the second set of configurable units and execution of the second application graph is started in the second set of configurable units.
-
公开(公告)号:US11232360B1
公开(公告)日:2022-01-25
申请号:US17216655
申请日:2021-03-29
发明人: Tejas Nagendra Babu Nama , Ruddhi Chaphekar , Ram Sivaramakrishnan , Raghu Prabhakar , Sumti Jairath , Junjue Wang , Kaizhao Liang , Adi Fuchs , Matheen Musaddiq , Arvind Krishna Sujeeth
摘要: Disclosed is a data processing system that includes compile time logic configured to process a processing graph to generate a modified processing graph, which includes a plurality of forward processing nodes of a forward pass and a plurality of backward processing nodes of a backward pass. The data processing system also includes runtime logic configured with the compile time logic to execute the modified processing graph to generate, at a backward processing node of the plurality of backward processing nodes, a plurality of partial weight gradients, based on processing a corresponding plurality of gradient tiles of a gradient tensor, and generate, based on the plurality of partial weight gradients, a final weight gradient corresponding to the gradient tensor.
-
公开(公告)号:US11188497B2
公开(公告)日:2021-11-30
申请号:US16198086
申请日:2018-11-21
发明人: Manish K. Shah , Ram Sivaramakrishnan , Mark Luttrell , David Brian Jackson , Raghu Prabhakar , Sumti Jairath , Gregory Frederick Grohoski , Pramod Nataraja
IPC分类号: G06F9/445 , G06F15/78 , G06F13/364
摘要: A reconfigurable data processor comprises a bus system, and an array of configurable units connected to the bus system, configurable units in the array including configuration data stores to store unit files comprising a plurality of sub-files of configuration data particular to the corresponding configurable units. Configurable units in the plurality of configurable units each include logic to execute a unit configuration load process, including receiving via the bus system, sub-files of a unit file particular to the configurable unit, and loading the received sub-files into the configuration store of the configurable unit. A configuration load controller connected to the bus system, including logic to execute an array configuration load process, including distributing a configuration file comprising unit files for a plurality of the configurable units in the array.
-
-
-
-
-
-
-
-
-