Abstract:
Embodiments relate to a neural processor circuit that includes a kernel access circuit and multiple neural engine circuits. The kernel access circuit reads compressed kernel data from memory external to the neural processor circuit. Each neural engine circuit receives compressed kernel data from the kernel access circuit. Each neural engine circuit includes a kernel extract circuit and a kernel multiply-add (MAD) circuit. The kernel extract circuit extracts uncompressed kernel data from the compressed kernel data. The kernel MAD circuit receives the uncompressed kernel data from the kernel extract circuit and performs neural network operations on a portion of input data using the uncompressed kernel data.
Abstract:
Embodiments relate to a neural processor circuit including one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.
Abstract:
Embodiments of the present disclosure relate to chained buffers in a neural processor circuit. The neural processor circuit includes multiple neural engines, a planar engine, a buffer memory, and a flow control circuit. At least one neural engine operates as a first producer of first data or a first consumer of second data. The planar engine operates as a second consumer receiving the first data from the first producer or a second producer sending the second data to the first consumer. Data flow between the at least one neural engine and the planar engine is controlled using at least a subset of buffers in the buffer memory operating as at least one chained buffer that chains flow of the first data and the second data between the at least one neural engine and the planar engine.
Abstract:
Embodiments of the present disclosure relate to a neural engine of a neural processor circuit having multiple multiply-add circuits and an accumulator circuit coupled to the multiply-add circuits. The multiply-add circuits perform multiply-add operations of a three dimensional convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle. The accumulator circuit includes multiple batches of accumulators. Each batch of accumulators receives and stores, after the processing cycle, the portion of the output data for each output depth plane of multiple output depth planes. A corresponding batch of accumulators stores, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane.
Abstract:
Embodiments relate to a neural processor that include a plurality of neural engine circuits and one or more planar engine circuits. The plurality of neural engine circuits can perform convolution operations of input data of the neural engine circuits with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. The planar engine circuit generates an output from input data that corresponds to output of the neural engine circuits or a version of input data of the neural processor. The planar engine circuit can be configured to multiple modes. In a pooling mode, the planar engine circuit reduces a spatial size of a version of the input data. In an elementwise mode, the planar engine circuit performs an elementwise operation on the input data. In a reduction mode, the planar engine circuit reduces the rank of a tensor.
Abstract:
A device that includes integrated circuit includes a tiler circuit, a grid generator, and a warper circuit. The tiler circuit divides the distorted input image data into a plurality of image tiles and stores the image tiles into a memory device. Each image tile is an M×N array of pixel samples where M and N are greater than 1. The grid generator produces a mesh grid that describes a mapping of first pixel locations of the distorted image data to second pixel locations of the corrected image data. The warper circuit reads one or more of the image tiles from the memory device based on the mesh grid and interpolates a warped output image from the image tiles read from memory.
Abstract:
Embodiments relate to a neural processor circuit that includes a kernel access circuit and multiple neural engine circuits. The kernel access circuit reads compressed kernel data from memory external to the neural processor circuit. Each neural engine circuit receives compressed kernel data from the kernel access circuit. Each neural engine circuit includes a kernel extract circuit and a kernel multiply-add (MAD) circuit. The kernel extract circuit extracts uncompressed kernel data from the compressed kernel data. The kernel MAD circuit receives the uncompressed kernel data from the kernel extract circuit and performs neural network operations on a portion of input data using the uncompressed kernel data.
Abstract:
A device that includes integrated circuit includes a tiler circuit, a grid generator, and a warper circuit. The tiler circuit divides the distorted input image data into a plurality of image tiles and stores the image tiles into a memory device. Each image tile is an M×N array of pixel samples where M and N are greater than 1. The grid generator produces a mesh grid that describes a mapping of first pixel locations of the distorted image data to second pixel locations of the corrected image data. The warper circuit reads one or more of the image tiles from the memory device based on the mesh grid and interpolates a warped output image from the image tiles read from memory.
Abstract:
The present disclosure generally relates to systems and methods for image data processing. In certain embodiments, an image processing pipeline may detect and correct a defective pixel of image data acquired using an image sensor. The image processing pipeline may receive an input pixel of the image data acquired using the image sensor. The image processing pipeline may then identify a set of neighboring pixels having the same color component as the input pixel and remove two neighboring pixels from the set of neighboring pixels thereby generating a modified set of neighboring pixels. Here, the two neighboring pixels correspond to a maximum pixel value and a minimum pixel value of the set of neighboring pixels. The image processing pipeline may then determine a gradient for each neighboring pixel in the modified set of neighboring pixels and determine whether the input pixel includes a dynamic defect or a speckle based at least in part on the gradient for each neighboring pixel in the modified set of neighboring pixels.