-
公开(公告)号:US10768899B2
公开(公告)日:2020-09-08
申请号:US16260548
申请日:2019-01-29
Applicant: SambaNova Systems, Inc.
Inventor: David Alan Koeplinger , Raghu Prabhakar , Ram Sivaramakrishnan , David Brian Jackson , Mark Luttrell
Abstract: A configurable circuit configurable according to the data width of elements of a matrix is described that includes a memory array, logic to write a matrix to the memory array having elements with a data width which can be specified using configuration data, logic for a transpose read of the matrix as-written and logic for normal read of the matrix as-written. The memory array includes first and second read ports operable in parallel. Transpose read logic and normal read logic can be coupled to the first and second read ports, respectively, allowing transpose and normal read of a matrix simultaneously.
-
公开(公告)号:US12105630B2
公开(公告)日:2024-10-01
申请号:US17582421
申请日:2022-01-24
Applicant: SambaNova Systems, Inc.
Inventor: Kevin James Brown , David Alan Koeplinger , Weiwei Chen , Xiaoming Gu
CPC classification number: G06F12/0842 , G06F8/447 , G06F8/457 , G06F15/7892 , G06F5/10 , G06F11/3072 , G06F2205/123 , G06F2212/1016 , G06F2212/45
Abstract: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.
-
3.
公开(公告)号:US11237971B1
公开(公告)日:2022-02-01
申请号:US17023015
申请日:2020-09-16
Applicant: SambaNova Systems, Inc.
Inventor: Kevin James Brown , David Alan Koeplinger , Weiwei Chen , Xiaoming Gu
Abstract: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.
-
公开(公告)号:US11714780B2
公开(公告)日:2023-08-01
申请号:US17326128
申请日:2021-05-20
Applicant: SambaNova Systems, Inc.
Inventor: David Alan Koeplinger , Raghu Prabhakar , Sumti Jairath
IPC: G06F15/78 , G06F16/901 , G06F8/41 , G06F12/02
CPC classification number: G06F15/7871 , G06F12/023 , G06F16/9024
Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units. It then places the physical memory units and the physical compute units onto positions in the array of configurable units and routes data and control networks between the placed positions.
-
公开(公告)号:US11080227B2
公开(公告)日:2021-08-03
申请号:US16536192
申请日:2019-08-08
Applicant: SambaNova Systems, Inc.
Inventor: David Alan Koeplinger , Raghu Prabhakar , Sumti Jairath
IPC: G06F15/78 , G06F16/901 , G06F12/02
Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units. It then places the physical memory units and the physical compute units onto positions in the array of configurable units and routes data and control networks between the placed positions.
-
公开(公告)号:US12164463B2
公开(公告)日:2024-12-10
申请号:US18130667
申请日:2023-04-04
Applicant: SambaNova Systems, Inc.
Inventor: David Alan Koeplinger , Weihang Fan
Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.
-
公开(公告)号:US11709664B2
公开(公告)日:2023-07-25
申请号:US16890841
申请日:2020-06-02
Applicant: SambaNova Systems, Inc.
Inventor: Weiwei Chen , Raghu Prabhakar , David Alan Koeplinger , Sitanshu Gupta , Ruddhi Arun Chaphekar , Ajit Punj , Sumti Jairath
CPC classification number: G06F8/452 , G06F8/41 , G06F15/7867 , G06F15/825
Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
-
公开(公告)号:US11782729B2
公开(公告)日:2023-10-10
申请号:US16996666
申请日:2020-08-18
Applicant: SambaNova Systems, Inc.
Inventor: Gregory Frederick Grohoski , Manish K. Shah , Raghu Prabhakar , Mark Luttrell , Ravinder Kumar , Kin Hing Leung , Ranen Chatterjee , Sumti Jairath , David Alan Koeplinger , Ram Sivaramakrishnan , Matthew Thomas Grimm
CPC classification number: G06F9/44505 , G06F9/5016
Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.
-
公开(公告)号:US11645057B2
公开(公告)日:2023-05-09
申请号:US17031679
申请日:2020-09-24
Applicant: SambaNova Systems, Inc.
Inventor: David Alan Koeplinger , Weiwei Chen , Kevin James Brown , Xiaoming Gu
IPC: G06F8/41 , G06F12/0842
CPC classification number: G06F8/443 , G06F8/433 , G06F12/0842
Abstract: A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.
-
公开(公告)号:US20200241844A1
公开(公告)日:2020-07-30
申请号:US16260548
申请日:2019-01-29
Applicant: SambaNova Systems, Inc.
Inventor: David Alan Koeplinger , Raghu Prabhakar , Ram Sivaramakrishnan , David Brian Jackson , Mark Luttrell
Abstract: A configurable circuit configurable according to the data width of elements of a matrix is described that includes a memory array, logic to write a matrix to the memory array having elements with a data width which can be specified using configuration data, logic for a transpose read of the matrix as-written and logic for normal read of the matrix as-written. The memory array includes first and second read ports operable in parallel. Transpose read logic and normal read logic can be coupled to the first and second read ports, respectively, allowing transpose and normal read of a matrix simultaneously.
-
-
-
-
-
-
-
-
-