-
公开(公告)号:US20240168915A1
公开(公告)日:2024-05-23
申请号:US18202059
申请日:2023-05-25
Applicant: SambaNova Systems, Inc.
Inventor: Yun DU , Gao DENG , Jianding LUO , Zhengyu CHEN
CPC classification number: G06F15/825 , G06F9/3867
Abstract: A method for reducing latency and increasing throughput in a reconfigurable computing system includes receiving a compute graph for execution on a reconfigurable dataflow processor comprising a grid of compute units and grid of memory units interconnected with a switching array. The compute graph includes a node specifying an operation on a tensor. The node may be split into multiple nodes that each specify the operation on a distinctive portion of the tensor to produce a first modified compute graph. The first modified compute graph may be executed. In addition, the multiple nodes may be within a single meta-pipeline stage and may be processed in parallel. Furthermore, the compute graph may further comprise a separate node for gathering the distinctive portions of the tensor into a complete tensor, to produce a second modified compute graph.
-
公开(公告)号:US20230385231A1
公开(公告)日:2023-11-30
申请号:US18199572
申请日:2023-05-19
Applicant: SambaNova Systems, Inc.
Inventor: Yun DU , Jianding LUO
CPC classification number: G06F15/7878 , G06F8/433
Abstract: A data processing system includes an array of reconfigurable units and a compiler configured to generate a pipeline of n computational nodes related to a dataflow graph, interleaved between n+1 buffers on the array of reconfigurable units. Each computational node is coupled to perform calculations based on data received from an immediately preceding buffer of the n+1 buffers and store results of the calculations into an immediately following buffer of the n+1 buffers after a latency. The compiler is further configured to remove a buffer of the n+1 buffers from the pipeline based on a comparison of the latencies of the computational nodes. A corresponding method is also disclosed herein.
-
公开(公告)号:US20230297349A1
公开(公告)日:2023-09-21
申请号:US18121766
申请日:2023-03-15
Applicant: SambaNova Systems, Inc.
Inventor: Gao DENG , Weihang FAN , Fei WANG , Yun DU
IPC: G06F8/41
CPC classification number: G06F8/433
Abstract: A computer-implemented method of transforming a high-level program for mapping onto a coarse-grained reconfigurable (CGR) processor with an array of CGR units, including sectioning a dataflow graph into a plurality of sections; extracting performance information for each of the plurality of sections; on a CGR unit: assigning to a section at least two computations dependent on a first data element; scheduling an additional load of the first data element in response to available memory bandwidth for that section; eliminating a buffer between the additional load of the first data element and one of the two computations, for that section; generating configuration data for the and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium.
-
-