NEURAL ENGINE WITH ACCELERATED MULTIPILER-ACCUMULATOR FOR CONVOLUTION OF INTERGERS

    公开(公告)号:US20240329933A1

    公开(公告)日:2024-10-03

    申请号:US18127650

    申请日:2023-03-28

    Applicant: Apple Inc.

    CPC classification number: G06F7/5443 G06N3/063

    Abstract: Embodiments of the present disclosure relate to a multiply-accumulator circuit that includes a main multiplier circuit operable in a floating-point mode or an integer mode and a supplemental multiplier circuit that operates in the integer mode. The main multiplier circuit generates a multiplied output that undergoes subsequent operations including a shifting operation in the floating-point mode whereas the supplemental multiplier generates another multiplied output that does not undergo any shifting operations. Hence, in the integer mode, two parallel multiply-add operations may be performed by the two multiplier circuits, and therefore accelerate the multiply-adder operations. Due to the lack of additional shifters associated with the supplemental multiplier circuit, the multiply-accumulator circuit does not have a significantly increased footprint.

    Texture unit circuit in neural network processor

    公开(公告)号:US12079724B2

    公开(公告)日:2024-09-03

    申请号:US18484203

    申请日:2023-10-10

    Applicant: Apple Inc.

    CPC classification number: G06N3/08

    Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.

    Neural Network Processor for Handling Differing Datatypes

    公开(公告)号:US20230169308A1

    公开(公告)日:2023-06-01

    申请号:US18095960

    申请日:2023-01-11

    Applicant: Apple Inc.

    CPC classification number: G06N3/04 G06N3/08 G06F7/523 G06F7/50

    Abstract: Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.

    Chained buffers in neural network processor

    公开(公告)号:US11513799B2

    公开(公告)日:2022-11-29

    申请号:US16673499

    申请日:2019-11-04

    Applicant: Apple Inc.

    Abstract: Embodiments of the present disclosure relate to chained buffers in a neural processor circuit. The neural processor circuit includes multiple neural engines, a planar engine, a buffer memory, and a flow control circuit. At least one neural engine operates as a first producer of first data or a first consumer of second data. The planar engine operates as a second consumer receiving the first data from the first producer or a second producer sending the second data to the first consumer. Data flow between the at least one neural engine and the planar engine is controlled using at least a subset of buffers in the buffer memory operating as at least one chained buffer that chains flow of the first data and the second data between the at least one neural engine and the planar engine.

    TASK CONTEXT SWITCH FOR NEURAL PROCESSOR CIRCUIT

    公开(公告)号:US20220237438A1

    公开(公告)日:2022-07-28

    申请号:US17155878

    申请日:2021-01-22

    Applicant: Apple Inc.

    Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor also includes a data processor circuit coupled to external system memory. The data processor circuit includes a buffer for storing the output data from the neural engines. The neural processor further includes a task manager coupled to the data processor circuit. The task manager receives a context-switch task. The context-switch task specifies a switch of the data processor circuit from handling an outgoing task to an incoming task. The task manager sends configuration data of the context-switch task to cause the data processor circuit to transmit the output data corresponding to the outgoing task from the buffer to the external system memory. The data processor circuit also fetches data corresponding to the incoming task from the external system memory to the buffer.

    MULTI-OPERATIONAL MODES OF NEURAL ENGINE CIRCUIT

    公开(公告)号:US20220222510A1

    公开(公告)日:2022-07-14

    申请号:US17148432

    申请日:2021-01-13

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural engine circuit of a neural network processor circuit that performs a convolution operation on input data in a first mode and a parallel sorting operation on input data in a second mode. The neural engine circuit includes a plurality of operation circuits and an accumulator circuit coupled to the plurality of operation circuits. The plurality of operation circuits receives input data. In the first mode, the plurality of operation circuits performs multiply-add operations of a convolution on the input data using a kernel. In the second mode, the plurality of operation circuits performs a portion of a parallel sorting operation on the input data. In the first mode, the accumulator circuit receives and stores first results of the multiply-add operations. In the second mode, the accumulator circuit receives and stores second results of the parallel sorting operation.

    PROCESSING NON-POWER-OF-TWO WORK UNIT IN NEURAL PROCESSOR CIRCUIT

    公开(公告)号:US20220222509A1

    公开(公告)日:2022-07-14

    申请号:US17148358

    申请日:2021-01-13

    Applicant: Apple Inc.

    Abstract: A neural processor includes one or more neural engine circuits for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural engine circuits process the input data having a power-of-two (P2) shape. The neural processor circuit also includes a data processor circuit. The data processor circuit fetches source data having a non-power-of-two (NP2) shape. The source data may correspond to data of a machine learning model. The data processor circuit also reshapes the source data to generate reshaped source data with the P2 shape. The data processor circuit further sends the reshaped source data to the one or more neural engine circuits as the input data for performing convolution operations. In some cases, the data processor circuit may also perform padding on the source data before the source data is reshaped to the P2 shape.

    TEXTURE UNIT CIRCUIT IN NEURAL NETWORK PROCESSOR

    公开(公告)号:US20220138553A1

    公开(公告)日:2022-05-05

    申请号:US17086023

    申请日:2020-10-30

    Applicant: Apple Inc.

    Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.

    SPLITTING OF INPUT DATA FOR PROCESSING IN NEURAL NETWORK PROCESSOR

    公开(公告)号:US20190340501A1

    公开(公告)日:2019-11-07

    申请号:US15971332

    申请日:2018-05-04

    Applicant: Apple Inc.

    Abstract: Embodiments of the present disclosure relate to splitting input data into smaller units for loading into a data buffer and neural engines in a neural processor circuit for performing neural network operations. The input data of a large size is split into slices and each slice is again split into tiles. The tile is uploaded from an external source to a data buffer inside the neural processor circuit but outside the neural engines. Each tile is again split into work units sized for storing in an input buffer circuit inside each neural engine. The input data stored in the data buffer and the input buffer circuit is reused by the neural engines to reduce re-fetching of input data. Operations of splitting the input data are performed at various components of the neural processor circuit under the management of rasterizers provided in these components.

Patent Agency Ranking