Parallel processor optimized for machine learning
摘要:
A parallel processor system for machine learning includes an arithmetic unit (ALU) array including several ALUs and a controller to provide instructions for the ALUs. The system further includes a direct-access memory (DMA) block containing multiple DMA engines to access an external memory to retrieve data. An input-stream buffer decouples the DMA block from the ALU array and provides aligning and reordering of the retrieved data. The DMA engines operate in parallel and include rasterization logic capable of performing a three-dimensional (3-D) rasterization.
公开/授权文献
信息查询
0/0