Abstract:
A plurality of three bit units (called triplets) are permuted by a shuffler to shuffle the positions of the triplets into different patterns which are used to specific the read/write operation of a memory. For example, the least significant triplet in a conventional counter can be placed in the most significant position of a permuted three triplet pattern. The count of this permuted counter triplet generates addresses that jump 64 positions each clock cycle. These permutations can then be used to generate read and write control information to read from/write to memory banks conducive for efficient Radix-8 Butterfly operation. In addition, one or more triplets can also determine if a barrel shifter or right circular shift is required to shift data from one data lane to a second data lane. The triplets allow efficient FFT operation in a pipelined structure.
Abstract:
Processors are arranged in a pipeline structure to operate on multiple layers of data, each layer comprising multiple groups of data. An input to a memory is coupled to an output of the last processor in the pipeline, and the memory's output is coupled to an input of the first processor in the pipeline. Multiplexing and de-multiplexing operations are performed in the pipeline. For each group in each layer, a stored result read from the memory is applied to the first processor in the pipeline structure. A calculated result of the stored result is output at the last processor and stored in the memory. Once processing for the last group of data in a first layer is completed, the corresponding processor is configured to process data in a next layer before the pipeline finishes processing the first layer. The stored result obtained from the next layer comprises a calculated result produced from a layer previous to the first layer.
Abstract:
The architecture is able to switch to Non-blocking check-node-update (CNU) scheduling architecture which has better performance than blocking CNU scheduling architecture. The architecture uses an Offset Min-Sum with Beta=1 with a clock domain operating at 440 MHz. The constraint macro-matrix is a spare matrix where each “1’ corresponds to a sub-array of a cyclically shifted identity matrix which is a shifted version of an identity matrix. Four core processors are used in the layered architecture where the constraint matrix uses a sub-array of 42 (check nodes)×42 (variable nodes) in the macro-array of 168×672 bits. Pipeline processing is used where the delay for each layer only requires 4 clock cycles.
Abstract:
The architecture is able to switch to Non-blocking check-node-update (CNU) scheduling architecture which has better performance than blocking CNU scheduling architecture. The architecture uses an Offset Min-Sum with Beta=1 with a clock domain operating at 440 MHz. The constraint macro-matrix is a spare matrix where each “1’ corresponds to a sub-array of a cyclically shifted identity matrix which is a shifted version of an identity matrix. Four core processors are used in the layered architecture where the constraint matrix uses a sub-array of 42 (check nodes)×42 (variable nodes) in the macro-array of 168×672 bits. Pipeline processing is used where the delay for each layer only requires 4 clock cycles.
Abstract:
Processors are arranged in a pipeline structure to operate on multiple layers of data, each layer comprising multiple groups of data. An input to a memory is coupled to an output of the last processor in the pipeline, and the memory's output is coupled to an input of the first processor in the pipeline. Multiplexing and de-multiplexing operations are performed in the pipeline. For each group in each layer, a stored result read from the memory is applied to the first processor in the pipeline structure. A calculated result of the stored result is output at the last processor and stored in the memory. Once processing for the last group of data in a first layer is completed, the corresponding processor is configured to process data in a next layer before the pipeline finishes processing the first layer. The stored result obtained from the next layer comprises a calculated result produced from a layer previous to the first layer.
Abstract:
Processors are arranged in a pipeline structure to operate on multiple layers of data, each layer comprising multiple groups of data. An input to a memory is coupled to an output of the last processor in the pipeline, and the memory's output is coupled to an input of the first processor in the pipeline. Multiplexing and de-multiplexing operations are performed in the pipeline. For each group in each layer, a stored result read from the memory is applied to the first processor in the pipeline structure. A calculated result of the stored result is output at the last processor and stored in the memory. Once processing for the last group of data in a first layer is completed, the corresponding processor is configured to process data in a next layer before the pipeline finishes processing the first layer. The stored result obtained from the next layer comprises a calculated result produced from a layer previous to the first layer.
Abstract:
Processors are arranged in a pipeline structure to operate on multiple layers of data, each layer comprising multiple groups of data. An input to a memory is coupled to an output of the last processor in the pipeline, and the memory's output is coupled to an input of the first processor in the pipeline. Multiplexing and de-multiplexing operations are performed in the pipeline. For each group in each layer, a stored result read from the memory is applied to the first processor in the pipeline structure. A calculated result of the stored result is output at the last processor and stored in the memory. Once processing for the last group of data in a first layer is completed, the corresponding processor is configured to process data in a next layer before the pipeline finishes processing the first layer. The stored result obtained from the next layer comprises a calculated result produced from a layer previous to the first layer.