Abstract:
Methods and apparatuses relating to QR decomposition using a multiple execution unit processing system are provided. A method includes receiving input values at the processing system and generating a first set of values based on the input values, where at least some of the first values are computed in parallel. A second set of values are generated recursively based on values in the first set. A third set of values are generated based on values in the second set, where at least some of the values in the third set are computed in parallel. The recursive component may be simplified to consist of one or more low latency operations. The processing performance of operations relating to QR decomposition may therefore be improved by using the parallelism available in multiple execution unit systems.
Abstract:
A self-timed parallelized multi-core processor and method for operating the processor are provided. The processor has an instruction decoder unit to receive a program code instruction, determine an operating code and latency for the program code instructions, and assign a loop index to the program code instruction. The processor further includes an instruction decomposer unit coupled to the instruction decoder unit, the instruction decomposer configured to create a primitive by decomposing the instruction, replace the loop index with a core index, and broadcast the primitive. The processor further has a plurality of self-timed processing cores coupled to the instruction decomposer unit, each core having a unique core index and having a dispatch unit for comparing the core index in the primitive with the core index of its processing core, each core acting on the primitive when the index of the processing core is within a threshold of the core index.