PROCESSING OF ASYMMETRICALLY QUANTIZED INPUT AND KERNEL COEFFICIENTS IN NEURAL NETWORK PROCESSOR

    公开(公告)号:US20240329929A1

    公开(公告)日:2024-10-03

    申请号:US18127528

    申请日:2023-03-28

    Applicant: Apple Inc.

    CPC classification number: G06F7/523 G06F7/50

    Abstract: Embodiments relate to performing multiply-accumulator operation on asymmetrically quantized input data and kernel data in a neural processor. Instead of adjusting to the input data at a multiply-accumulator to account for the asymmetric quantization of the input data, an adjusted bias for the multiply-accumulator operation is computed beforehand and stored in the multiply-accumulator. On the other hand, kernel coefficients derived from the kernel data are adjusted at the multiply-accumulator to account for the asymmetric quantization. In this way, computational complexity associated with asymmetric quantization may be reduced while increasing the efficiency of the convolution operations at the neural processor.

    STOCHASTIC ROUNDING FOR NEURAL PROCESSOR CIRCUIT

    公开(公告)号:US20230236799A1

    公开(公告)日:2023-07-27

    申请号:US17582939

    申请日:2022-01-24

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor circuit that includes a neural engine and a post-processing circuit. The neural engine performs a computational task related to a neural network to generate a processed value. The post-processing circuit includes a random bit generator, an adder circuit and a rounding circuit. The random bit generator generates a random string of bits. The adder circuit adds the random string of bits to a version of the processed value to generate an added value. The rounding circuit truncates the added value to generate an output value of the computational task. The random bit generator may include a linear-feedback shift register (LFSR) that generates random numbers based on a seed. The seed may be derived from a master seed that is specific to a task of the neural network.

    TASK CONTEXT SWITCH FOR NEURAL PROCESSOR CIRCUIT

    公开(公告)号:US20220237438A1

    公开(公告)日:2022-07-28

    申请号:US17155878

    申请日:2021-01-22

    Applicant: Apple Inc.

    Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor also includes a data processor circuit coupled to external system memory. The data processor circuit includes a buffer for storing the output data from the neural engines. The neural processor further includes a task manager coupled to the data processor circuit. The task manager receives a context-switch task. The context-switch task specifies a switch of the data processor circuit from handling an outgoing task to an incoming task. The task manager sends configuration data of the context-switch task to cause the data processor circuit to transmit the output data corresponding to the outgoing task from the buffer to the external system memory. The data processor circuit also fetches data corresponding to the incoming task from the external system memory to the buffer.

    Branching operation for neural processor circuit

    公开(公告)号:US12210959B2

    公开(公告)日:2025-01-28

    申请号:US17155896

    申请日:2021-01-22

    Applicant: Apple Inc.

    Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor circuit also includes a data processor circuit that is coupled to one or more neural engine. The data processor circuit receives the output data from the neural engine and generates a branching command from the output data. The neural processor circuit further includes a task manager that is coupled to the data processor circuit. The task manager receives the branching command from the data processor circuit. The task manager enqueues one of two or more segment branches according to the received branching command. The two or more segment branches are subsequent to a pre-branch task segment that includes the pre-branch task. The task manager transmits a task from the selected one of the segment branches to data processor circuit to perform the task.

    Broadcasting mode of planar engine for neural processor

    公开(公告)号:US11630991B2

    公开(公告)日:2023-04-18

    申请号:US16781824

    申请日:2020-02-04

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.

    Ternary mode of planar engine for neural processor

    公开(公告)号:US11604975B2

    公开(公告)日:2023-03-14

    申请号:US16844964

    申请日:2020-04-09

    Applicant: Apple Inc.

    Abstract: A neural processor includes one or more neural engine circuits and a planar engine circuit. The neural engine circuits can perform convolution operations of first input data with one or more kernels to generate a first output. The planar engine circuit receives second input data that corresponds to a version of the first input data. The planar engine circuit also receives third input data that includes fourth input data and fifth input data stored together in a dimension of third input data. The planar engine circuit performs a first elementwise operation between a version of the second input data and a version of the fourth input data to generate intermediate data. The planar engine circuit performs a second elementwise operation between the intermediate data and a version of the fifth input data to generate a second output.

    Asynchronous task execution for neural processor circuit

    公开(公告)号:US11599780B2

    公开(公告)日:2023-03-07

    申请号:US16806798

    申请日:2020-03-02

    Applicant: Apple Inc.

    Abstract: A neural processor circuit including one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.

    TERNARY MODE OF PLANAR ENGINE FOR NEURAL PROCESSOR

    公开(公告)号:US20210319290A1

    公开(公告)日:2021-10-14

    申请号:US16844964

    申请日:2020-04-09

    Applicant: Apple Inc.

    Abstract: A neural processor includes one or more neural engine circuits and a planar engine circuit. The neural engine circuits can perform convolution operations of first input data with one or more kernels to generate a first output. The planar engine circuit receives second input data that corresponds to a version of the first input data. The planar engine circuit also receives third input data that includes fourth input data and fifth input data stored together in a dimension of third input data. The planar engine circuit performs a first elementwise operation between a version of the second input data and a version of the fourth input data to generate intermediate data. The planar engine circuit performs a second elementwise operation between the intermediate data and a version of the fifth input data to generate a second output.

    Secure Exclaves
    9.
    发明申请

    公开(公告)号:US20250094563A1

    公开(公告)日:2025-03-20

    申请号:US18790529

    申请日:2024-07-31

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to securing hardware accelerators used by a computing device. In some embodiments, a computing device includes one or more processors configured to co-execute trusted processes and untrusted processes in an isolated manner that includes implementing a secure environment in which a set of security criteria is enforced for data of the trusted processes. The computing device further includes multiple heterogenous hardware accelerators configured to implement exclaves of the secure environment that extend enforcement of one or more of the set of security criteria within the hardware accelerators for data distributed to the hardware accelerators for performance of tasks associated with the trusted processes.

    Multi-mode planar engine for neural processor

    公开(公告)号:US12229657B2

    公开(公告)日:2025-02-18

    申请号:US16596439

    申请日:2019-10-08

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor that include a plurality of neural engine circuits and one or more planar engine circuits. The plurality of neural engine circuits can perform convolution operations of input data of the neural engine circuits with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. The planar engine circuit generates an output from input data that corresponds to output of the neural engine circuits or a version of input data of the neural processor. The planar engine circuit can be configured to multiple modes. In a pooling mode, the planar engine circuit reduces a spatial size of a version of the input data. In an elementwise mode, the planar engine circuit performs an elementwise operation on the input data. In a reduction mode, the planar engine circuit reduces the rank of a tensor.

Patent Agency Ranking