PERFORMING MULTI-CONVOLUTION OPERATIONS IN A PARALLEL PROCESSING SYSTEM
    2.
    发明申请
    PERFORMING MULTI-CONVOLUTION OPERATIONS IN A PARALLEL PROCESSING SYSTEM 审中-公开
    在并行处理系统中执行多次转换操作

    公开(公告)号:US20160062947A1

    公开(公告)日:2016-03-03

    申请号:US14838291

    申请日:2015-08-27

    CPC classification number: G06F17/153

    Abstract: In one embodiment of the present invention a convolution engine configures a parallel processing pipeline to perform multi-convolution operations. More specifically, the convolution engine configures the parallel processing pipeline to independently generate and process individual image tiles. In operation, for each image tile, the pipeline calculates source locations included in an input image batch. Notably, the source locations reflect the contribution of the image tile to an output tile of an output matrix—the result of the multi-convolution operation. Subsequently, the pipeline copies data from the source locations to the image tile. Similarly, the pipeline copies data from a filter stack to a filter tile. The pipeline then performs matrix multiplication operations between the image tile and the filter tile to generate data included in the corresponding output tile. To optimize both on-chip memory usage and execution time, the pipeline creates each image tile in on-chip memory as-needed.

    Abstract translation: 在本发明的一个实施例中,卷积引擎配置并行处理流水线以执行多卷积运算。 更具体地,卷积引擎配置并行处理流水线以独立地生成和处理各个图像块。 在操作中,对于每个图像块,流水线计算包括在输入图像批中的源位置。 值得注意的是,源位置反映了图像块对输出矩阵的输出平铺的贡献 - 多卷积运算的结果。 随后,管道将数据从源位置复制到图像块。 类似地,流水线将数据从过滤器堆栈复制到过滤器瓦片。 然后,管道在图像块和滤波器块之间执行矩阵乘法运算,以生成包括在相应的输出块中的数据。 为了优化片内存储器使用和执行时间,管道将根据需要在片上存储器中创建每个图像块。

Patent Agency Ranking