Method and apparatus of a fully-pipelined FFT
    1.
    发明授权
    Method and apparatus of a fully-pipelined FFT 有权
    全流水线FFT的方法和装置

    公开(公告)号:US09418047B2

    公开(公告)日:2016-08-16

    申请号:US14192725

    申请日:2014-02-27

    申请人: Tensorcom, Inc.

    IPC分类号: G06F17/14

    CPC分类号: G06F17/142

    摘要: A plurality of three bit units (called triplets) are permuted by a shuffler to shuffle the positions of the triplets into different patterns which are used to specific the read/write operation of a memory. For example, the least significant triplet in a conventional counter can be placed in the most significant position of a permuted three triplet pattern. The count of this permuted counter triplet generates addresses that jump 64 positions each clock cycle. These permutations can then be used to generate read and write control information to read from/write to memory banks conducive for efficient Radix-8 Butterfly operation. In addition, one or more triplets can also determine if a barrel shifter or right circular shift is required to shift data from one data lane to a second data lane. The triplets allow efficient FFT operation in a pipelined structure.

    摘要翻译: 多个三位单元(称为三元组)由洗牌器置换,以将三元组的位置洗牌到用于特定存储器的读/写操作的不同模式中。 例如,常规计数器中的最低有效三重态可以被置于置换的三重态图案的最重要位置。 这个置换的计数器三元组的计数产生每个时钟周期跳转64个位置的地址。 然后可以使用这些排列来产生读/写控制信息,从而有助于高效的“八只蝴蝶”操作从存储器库读/写。 此外,一个或多个三元组还可以确定是否需要桶形移位器或右循环移位来将数据从一个数据通道移动到第二数据通道。 三元组允许在流水线结构中进行有效的FFT运算。

    Method and Apparatus of a Fully-Pipelined Layered LDPC Decoder
    2.
    发明申请
    Method and Apparatus of a Fully-Pipelined Layered LDPC Decoder 审中-公开
    全流水线分层LDPC解码器的方法和装置

    公开(公告)号:US20160173131A1

    公开(公告)日:2016-06-16

    申请号:US15011252

    申请日:2016-01-29

    申请人: Tensorcom, Inc.

    IPC分类号: H03M13/11

    摘要: Processors are arranged in a pipeline structure to operate on multiple layers of data, each layer comprising multiple groups of data. An input to a memory is coupled to an output of the last processor in the pipeline, and the memory's output is coupled to an input of the first processor in the pipeline. Multiplexing and de-multiplexing operations are performed in the pipeline. For each group in each layer, a stored result read from the memory is applied to the first processor in the pipeline structure. A calculated result of the stored result is output at the last processor and stored in the memory. Once processing for the last group of data in a first layer is completed, the corresponding processor is configured to process data in a next layer before the pipeline finishes processing the first layer. The stored result obtained from the next layer comprises a calculated result produced from a layer previous to the first layer.

    摘要翻译: 处理器被布置在流水线结构中以在多层数据上操作,每层包括多组数据。 存储器的输入耦合到流水线中的最后一个处理器的输出,并且存储器的输出耦合到流水线中的第一处理器的输入。 在流水线中执行多路复用和解复用操作。 对于每个层中的每个组,将从存储器读取的存储结果应用于流水线结构中的第一处理器。 存储结果的计算结果在最后一个处理器处输出并存储在存储器中。 一旦对第一层中的最后一组数据的处理完成,相应的处理器被配置为在管线完成对第一层的处理之前处理下一层中的数据。 从下一层获得的存储结果包括从第一层之前的层产生的计算结果。

    Method and apparatus of a fully-pipelined layered LDPC decoder
    3.
    发明授权
    Method and apparatus of a fully-pipelined layered LDPC decoder 有权
    全流水线分层LDPC解码器的方法和装置

    公开(公告)号:US09276610B2

    公开(公告)日:2016-03-01

    申请号:US14165505

    申请日:2014-01-27

    申请人: Tensorcom, Inc.

    IPC分类号: H03M13/11

    摘要: The architecture is able to switch to Non-blocking check-node-update (CNU) scheduling architecture which has better performance than blocking CNU scheduling architecture. The architecture uses an Offset Min-Sum with Beta=1 with a clock domain operating at 440 MHz. The constraint macro-matrix is a spare matrix where each “1’ corresponds to a sub-array of a cyclically shifted identity matrix which is a shifted version of an identity matrix. Four core processors are used in the layered architecture where the constraint matrix uses a sub-array of 42 (check nodes)×42 (variable nodes) in the macro-array of 168×672 bits. Pipeline processing is used where the delay for each layer only requires 4 clock cycles.

    摘要翻译: 该架构能够切换到具有比阻塞CNU调度架构更好的性能的非阻塞校验节点更新(CNU)调度体系结构。 该架构使用Beta = 1的偏移最小和,时钟域工作在440 MHz。 约束宏矩阵是备用矩阵,其中每个“1”对应于作为单位矩阵的移位版本的循环移位单位矩阵的子阵列。 在分层架构中使用四个核心处理器,约束矩阵在168×672位的宏阵列中使用42(校验节点)×42(变量节点)的子阵列。 使用管道处理,其中每层的延迟只需要4个时钟周期。

    Method and Apparatus of a Fully-Pipelined Layered LDPC Decoder
    4.
    发明申请
    Method and Apparatus of a Fully-Pipelined Layered LDPC Decoder 有权
    全流水线分层LDPC解码器的方法和装置

    公开(公告)号:US20150214980A1

    公开(公告)日:2015-07-30

    申请号:US14165505

    申请日:2014-01-27

    申请人: Tensorcom, Inc.

    IPC分类号: H03M13/11

    摘要: The architecture is able to switch to Non-blocking check-node-update (CNU) scheduling architecture which has better performance than blocking CNU scheduling architecture. The architecture uses an Offset Min-Sum with Beta=1 with a clock domain operating at 440 MHz. The constraint macro-matrix is a spare matrix where each “1’ corresponds to a sub-array of a cyclically shifted identity matrix which is a shifted version of an identity matrix. Four core processors are used in the layered architecture where the constraint matrix uses a sub-array of 42 (check nodes)×42 (variable nodes) in the macro-array of 168×672 bits. Pipeline processing is used where the delay for each layer only requires 4 clock cycles.

    摘要翻译: 该架构能够切换到具有比阻塞CNU调度架构更好的性能的非阻塞校验节点更新(CNU)调度体系结构。 该架构使用Beta = 1的偏移最小和,时钟域工作在440 MHz。 约束宏矩阵是备用矩阵,其中每个“1”对应于作为单位矩阵的移位版本的循环移位单位矩阵的子阵列。 在分层架构中使用四个核心处理器,约束矩阵在168×672位的宏阵列中使用42(校验节点)×42(变量节点)的子阵列。 使用管道处理,其中每层的延迟只需要4个时钟周期。

    Method and apparatus of a fully-pipelined layered LDPC decoder

    公开(公告)号:US10778250B2

    公开(公告)日:2020-09-15

    申请号:US16277890

    申请日:2019-02-15

    申请人: TensorCom, Inc.

    IPC分类号: H03M13/11

    摘要: Processors are arranged in a pipeline structure to operate on multiple layers of data, each layer comprising multiple groups of data. An input to a memory is coupled to an output of the last processor in the pipeline, and the memory's output is coupled to an input of the first processor in the pipeline. Multiplexing and de-multiplexing operations are performed in the pipeline. For each group in each layer, a stored result read from the memory is applied to the first processor in the pipeline structure. A calculated result of the stored result is output at the last processor and stored in the memory. Once processing for the last group of data in a first layer is completed, the corresponding processor is configured to process data in a next layer before the pipeline finishes processing the first layer. The stored result obtained from the next layer comprises a calculated result produced from a layer previous to the first layer.

    METHOD AND APPARATUS OF A FULLY-PIPELINED LAYERED LDPC DECODER

    公开(公告)号:US20190222227A1

    公开(公告)日:2019-07-18

    申请号:US16277890

    申请日:2019-02-15

    申请人: TensorCom, Inc.

    IPC分类号: H03M13/11

    摘要: Processors are arranged in a pipeline structure to operate on multiple layers of data, each layer comprising multiple groups of data. An input to a memory is coupled to an output of the last processor in the pipeline, and the memory's output is coupled to an input of the first processor in the pipeline. Multiplexing and de-multiplexing operations are performed in the pipeline. For each group in each layer, a stored result read from the memory is applied to the first processor in the pipeline structure. A calculated result of the stored result is output at the last processor and stored in the memory. Once processing for the last group of data in a first layer is completed, the corresponding processor is configured to process data in a next layer before the pipeline finishes processing the first layer. The stored result obtained from the next layer comprises a calculated result produced from a layer previous to the first layer.

    Method and apparatus of a fully-pipelined layered LDPC decoder

    公开(公告)号:US10250280B2

    公开(公告)日:2019-04-02

    申请号:US15011252

    申请日:2016-01-29

    申请人: Tensorcom, Inc.

    IPC分类号: H03M13/11

    摘要: Processors are arranged in a pipeline structure to operate on multiple layers of data, each layer comprising multiple groups of data. An input to a memory is coupled to an output of the last processor in the pipeline, and the memory's output is coupled to an input of the first processor in the pipeline. Multiplexing and de-multiplexing operations are performed in the pipeline. For each group in each layer, a stored result read from the memory is applied to the first processor in the pipeline structure. A calculated result of the stored result is output at the last processor and stored in the memory. Once processing for the last group of data in a first layer is completed, the corresponding processor is configured to process data in a next layer before the pipeline finishes processing the first layer. The stored result obtained from the next layer comprises a calculated result produced from a layer previous to the first layer.