Multi-mode planar engine for neural processor

    公开(公告)号:US12229657B2

    公开(公告)日:2025-02-18

    申请号:US16596439

    申请日:2019-10-08

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor that include a plurality of neural engine circuits and one or more planar engine circuits. The plurality of neural engine circuits can perform convolution operations of input data of the neural engine circuits with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. The planar engine circuit generates an output from input data that corresponds to output of the neural engine circuits or a version of input data of the neural processor. The planar engine circuit can be configured to multiple modes. In a pooling mode, the planar engine circuit reduces a spatial size of a version of the input data. In an elementwise mode, the planar engine circuit performs an elementwise operation on the input data. In a reduction mode, the planar engine circuit reduces the rank of a tensor.

    ASYNCHRONOUS TASK EXECUTION FOR NEURAL PROCESSOR CIRCUIT

    公开(公告)号:US20230081023A1

    公开(公告)日:2023-03-16

    申请号:US17989275

    申请日:2022-11-17

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor circuit including one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.

    Reduction mode of planar engine in neural processor

    公开(公告)号:US11537864B2

    公开(公告)日:2022-12-27

    申请号:US16695782

    申请日:2019-11-26

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In a reduction mode, the planar engine circuit may process values arranged in one or more dimensions of input to generate a reduced value. The reduced values across multiple input data may be accumulated. The planar engine circuit may program a filter circuit as a reduction tree to gradually reduce the data into a reduced value. The reduction operation reduces the size of one or more dimensions of a tensor.

    BRANCHING OPERATION FOR NEURAL PROCESSOR CIRCUIT

    公开(公告)号:US20220237439A1

    公开(公告)日:2022-07-28

    申请号:US17155896

    申请日:2021-01-22

    Applicant: Apple Inc.

    Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor circuit also includes a data processor circuit that is coupled to one or more neural engine. The data processor circuit receives the output data from the neural engine and generates a branching command from the output data. The neural processor circuit further includes a task manager that is coupled to the data processor circuit. The task manager receives the branching command from the data processor circuit. The task manager enqueues one of two or more segment branches according to the received branching command. The two or more segment branches are subsequent to a pre-branch task segment that includes the pre-branch task. The task manager transmits a task from the selected one of the segment branches to data processor circuit to perform the task.

    Task context switch for neural processor circuit

    公开(公告)号:US12229586B2

    公开(公告)日:2025-02-18

    申请号:US17155878

    申请日:2021-01-22

    Applicant: Apple Inc.

    Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor also includes a data processor circuit coupled to external system memory. The data processor circuit includes a buffer for storing the output data from the neural engines. The neural processor further includes a task manager coupled to the data processor circuit. The task manager receives a context-switch task. The context-switch task specifies a switch of the data processor circuit from handling an outgoing task to an incoming task. The task manager sends configuration data of the context-switch task to cause the data processor circuit to transmit the output data corresponding to the outgoing task from the buffer to the external system memory. The data processor circuit also fetches data corresponding to the incoming task from the external system memory to the buffer.

    Broadcasting mode of planar engine for neural processor

    公开(公告)号:US12124943B2

    公开(公告)日:2024-10-22

    申请号:US18120218

    申请日:2023-03-10

    Applicant: Apple Inc.

    CPC classification number: G06N3/063 G06F7/78 G06F9/542 G06N3/084 G06N20/10

    Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.

    Asynchronous task execution for neural processor circuit

    公开(公告)号:US11934941B2

    公开(公告)日:2024-03-19

    申请号:US17989275

    申请日:2022-11-17

    Applicant: Apple Inc.

    Abstract: A neural processor circuit includes one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.

    BROADCASTING MODE OF PLANAR ENGINE FOR NEURAL PROCESSOR

    公开(公告)号:US20230206051A1

    公开(公告)日:2023-06-29

    申请号:US18120218

    申请日:2023-03-10

    Applicant: Apple Inc.

    CPC classification number: G06N3/063 G06N3/084 G06F7/78 G06F9/542 G06N20/10

    Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.

    ASYNCHRONOUS TASK EXECUTION FOR NEURAL PROCESSOR CIRCUIT

    公开(公告)号:US20210271958A1

    公开(公告)日:2021-09-02

    申请号:US16806798

    申请日:2020-03-02

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor circuit including one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.

Patent Agency Ranking