-
公开(公告)号:US20230273818A1
公开(公告)日:2023-08-31
申请号:US18119315
申请日:2023-03-09
申请人: Ascenium, Inc.
发明人: Peter Foley
CPC分类号: G06F9/4881 , G06F8/41
摘要: Techniques for task processing based on a highly parallel processing architecture with out-of-order resolution are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. The array of compute elements is coupled to supporting logic and to memory, which, along with the array of compute elements, comprise compute hardware. A set of directions is provided to the hardware, through a control word generated by the compiler, for compute element operation. The set of directions is augmented with data access ordering information. The data access ordering is performed by the hardware. A compiled task is executed on the array of compute elements, based on the set of directions that was augmented.
-
公开(公告)号:US20230281014A1
公开(公告)日:2023-09-07
申请号:US18195407
申请日:2023-05-10
申请人: Ascenium, Inc.
发明人: Peter Foley
IPC分类号: G06F9/30
CPC分类号: G06F9/30065 , G06F9/30043
摘要: Techniques for parallel processing of multiple loops with loads and stores are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array is known to a compiler and is coupled to its neighboring compute elements within the array. Control for the compute elements is provided on a cycle-by-cycle basis. Control is enabled by a stream of wide control words generated by the compiler. Memory access operations are tagged with precedence information. The tagging is contained in the control words and is implemented for loop operations. The tagging is provided by the compiler at compile time. Control word data is loaded for multiple, independent loops into the compute elements. The multiple, independent loops are executed. Memory is accessed based on the precedence information. The memory access includes loads and/or stores for data relating to the independent loops.
-
公开(公告)号:US20230376447A1
公开(公告)日:2023-11-23
申请号:US18228001
申请日:2023-07-31
申请人: Ascenium, Inc.
发明人: Peter Foley
IPC分类号: G06F15/80 , G06F12/0842
CPC分类号: G06F15/80 , G06F12/0842 , G06F2212/452
摘要: Techniques for parallel processing based on a parallel processing architecture with dual load buffers are disclosed. A two-dimensional array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements. A first data cache is coupled to the array. The first data cache enables loading data to a first portion of the array. The first data cache supports an address space. A second data cache is coupled to the array. The second data cache enables loading data to a second portion of the array. The second data cache supports the address space. Instructions are executed within the array. Instructions executed within the first portion of the array of compute elements use data loaded from the first data cache, and instructions executed within the second portion of the array of compute elements use data loaded from the second data cache.
-
公开(公告)号:US20220291957A1
公开(公告)日:2022-09-15
申请号:US17752898
申请日:2022-05-25
申请人: Ascenium, Inc.
发明人: Peter Foley
摘要: Techniques for task processing based on a parallel processing architecture with distributed register files are disclosed. A two-dimensional array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements. The array of compute elements is controlled on a cycle-by-cycle basis. The controlling is enabled by a stream of wide, variable length, control words generated by the compiler. Virtual registers are mapped to a plurality of physical register files distributed among one or more of the compute elements. Virtual registers are represented by the compiler. The mapping is performed by the compiler. A broadcast write operation is enabled to two or more of the physical register files. Operations contained in the control words are executed. Operations are enabled by at least one of the distributed physical register files. Implementation in separate compute elements enables parallel operation processing.
-
公开(公告)号:US20220075651A1
公开(公告)日:2022-03-10
申请号:US17526003
申请日:2021-11-15
申请人: Ascenium, Inc.
发明人: Øyvind Harboe , Tore Bastiansen , Peter Foley
摘要: Techniques for task processing using a highly parallel processing architecture with a compiler are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. A set of directions is provided to the hardware, through a control word generated by the compiler, for compute element operation and memory access precedence. The set of directions enables the hardware to properly sequence compute element results. The set of directions controls data movement for the array of compute elements. A compiled task is executed on the array of compute elements, based on the set of directions. The compute element results are generated in parallel in the array, and the compute element results are ordered independently from control word arrival at each compute element.
-
公开(公告)号:US20230031902A1
公开(公告)日:2023-02-02
申请号:US17963226
申请日:2022-10-11
申请人: Ascenium, Inc.
发明人: Peter Foley
摘要: Techniques for task processing based on load latency amelioration using bunch buffers are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide control words generated by the compiler. Sets of control word bits are loaded into buffers. Each buffer is associated with and coupled to a unique compute element within the array of compute elements. The sets of control word bits provide operational control for the compute element with which it is associated. Operations are executed within the array of elements. The operations are based on a selected set of control word bits which comprise a control word bunch.
-
公开(公告)号:US11531638B2
公开(公告)日:2022-12-20
申请号:US16114130
申请日:2018-08-27
申请人: ASCENIUM INC.
发明人: Robert Keith Mykland
IPC分类号: G06F15/78
摘要: A method and system are provided for configurable computation and data processing. A logical processor includes an array of logic elements. The processor may be a combinatorial circuit that can be applied to modify computational aspects of an array of reconfigurable circuits. A memory stores a plurality of instructions, each instruction including an instruction-fetch data portion and an output data transfer data portion. One or more memory controllers are coupled to the memory and receive instructions and/or output data from the memory. A back buffer is coupled with the memory controller and receives instructions from the memory controller. The back buffer sequentially asserts each received instruction upon one or more memory controllers. The memory controllers transfer data received from the memory to a target, such as an array of reconfigurable logic circuits that are optionally coupled to the memory, the back buffer, and one or more additional memory controllers.
-
公开(公告)号:US20220374286A1
公开(公告)日:2022-11-24
申请号:US17879827
申请日:2022-08-03
申请人: Ascenium, Inc.
发明人: Peter Foley
摘要: Techniques for task processing in a parallel processing architecture for atomic operations are disclosed. A two-dimensional array of compute elements is accessed, where each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the array of compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide control words generated by the compiler. At least one of the control words involves an operation requiring at least one additional operation. A bit of the control word is set, where the bit indicates a multicycle operation. The control word is executed, on at least one compute element within the array of compute elements, based on the bit. The multicycle operation comprises a read-modify-write operation.
-
公开(公告)号:US20220214885A1
公开(公告)日:2022-07-07
申请号:US17704056
申请日:2022-03-25
申请人: Ascenium, Inc.
发明人: Peter Foley
摘要: Techniques for program execution in a parallel processing architecture using speculative encoding are disclosed. A two-dimensional array of compute elements is accessed, where each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the array of compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide, variable length, control words generated by the compiler. Two or more operations are coalesced into a control word, where the control word includes a branch decision and operations associated with the branch decision. The coalesced control word includes speculatively encoded operations for at least two possible branch paths. The at least two possible branch paths generate independent side effects. Operations associated with the branch decision that are not indicated by the branch decision are suppressed.
-
公开(公告)号:US20220075740A1
公开(公告)日:2022-03-10
申请号:US17500990
申请日:2021-10-14
申请人: Ascenium, Inc.
发明人: Peter Foley
摘要: Techniques for task processing using a parallel processing architecture with background loads are disclosed. A two-dimensional array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements. Operation of the array is paused. The pausing occurs while a memory system continues operation. A bus coupling the array is repurposed. The repurposing couples one or more compute elements in the array to the memory system. A memory system operation is enabled during the pausing. Data is transferred from the memory system to the array of compute elements using the bus that was repurposed. The data from the memory system is transferred to scratchpad memory in the one or more compute elements within the two-dimensional array. The scratchpad memory provides operand storage. The data is tagged. The tagging guides the transferring to a particular compute element.
-
-
-
-
-
-
-
-
-