Transposing Memory Layout of Weights in Deep Neural Networks (DNNs)
摘要:
A compute block includes a DMA engine that reads data from an external memory and write the data into a local memory of the compute block. An MAC array in the compute block may use the data to perform convolutions. The external memory may store weights of one or more filters in a memory layout that comprises a sequence of sections for each filter. Each section may correspond to a channel of the filter and may store all the weights in the channel. The DMA engine may convert the memory layout to a different memory layout, which includes a sequence of new sections for each filter. Each new section may include a weight vector that includes a sequence of weights, each of which is from a different channel. The DMA engine may also compress the weights, e.g., by removing zero valued weights, before the conversion of the memory layout.
信息查询
0/0